Private by default
Prompts, completions and critiques never leave your machine. No cloud round-trips, no telemetry, no accounts.
MultiMind orchestrates your local LLMs with three reasoning architectures - sequential pipelines, parallel expert councils, and full organisational hierarchies - so a 3B model on your laptop can match the depth of a frontier system.
Every design choice is in service of a single idea: give small, private models the structured scaffolding that frontier APIs get for free.
Prompts, completions and critiques never leave your machine. No cloud round-trips, no telemetry, no accounts.
Sequential pipeline, parallel council, or hierarchical org - pick the structure that fits the problem, not the model.
Auto-discovers Ollama and LM Studio, then lets you map any model to any reasoning step.
Every plan, critique and delegation streams in real time as collapsible thought blocks - the model's work is never a black box.
Force a small model through the same discipline a senior engineer applies: plan the work, execute the plan, then critique the output and rewrite the final answer. Each stage runs with its own tuned system prompt and can use a different model.
Get independent expert opinions in parallel, then have a Lead Judge synthesise them into one reasoned answer. Council members can be different models - mix a coding-tuned 2B with a general 3B and a reasoning-focused 4B to get genuinely diverse perspectives.
For problems too big for one prompt, spin up a virtual company. A CEO decomposes the request into tickets, department heads delegate to specialist employees, employees execute, and results propagate up the hierarchy for synthesis.
No framework-of-the-week, no build step, no telemetry SDK. FastAPI streams NDJSON to vanilla JS. Easy to read, easy to fork, easy to trust.
All reasoning runs through async generators, so tokens stream the instant the model emits them.
Each step label, token, and final answer is a single JSON line - trivial to proxy, log, or replay.
One HTML file, one stylesheet, one JS module. No bundler, no package-lock to audit.
On startup we probe Ollama and LM Studio, so a fresh install usually just works.
The client watches for repeated suffixes mid-stream and stops generation early - vital on small models.
Org Mode tracks every delegation with assignees, parent goals, and status - no database to run.
MultiMind runs entirely on your machine against your local model server. There is no MultiMind server, no telemetry, no background sync, no "optional" analytics.
MultiMind ships its own benchmark harness alongside lm-evaluation-harness. These numbers are from an ongoing run on consumer hardware - not cherry-picked marketing.
MultiMind assumes you already have a local model runner. If you don't, grab Ollama - it's a one-click install on macOS, Windows and Linux.
Download Ollama or LM Studio, then pull at least one model (for example, qwen3.5:0.8b).
qwen3.5:0.8bInstall from PyPI. Python 3.10+ is the only prerequisite; there is no JavaScript build step.
pip install multimindStart the server and open the UI. MultiMind will auto-detect your local runner and preselect the first available model.
multimindInstall once, own the full reasoning stack on your own machine. MultiMind stays out of your way - and out of the network.