MultiMind
Local-first reasoning OS

Make small local models reason like big ones.

MultiMind orchestrates your local LLMs with three reasoning architectures - sequential pipelines, parallel expert councils, and full organisational hierarchies - so a 3B model on your laptop can match the depth of a frontier system.

  • Runs 100% locally
  • Open source (MIT)
  • No API keys, ever
  • Ollama & LM Studio ready
Screenshot of the MultiMind AI workspace showing streaming reasoning steps
Why MultiMind

Four hard guarantees, one wrapper.

Every design choice is in service of a single idea: give small, private models the structured scaffolding that frontier APIs get for free.

Private by default

Prompts, completions and critiques never leave your machine. No cloud round-trips, no telemetry, no accounts.

Three modes in one UI

Sequential pipeline, parallel council, or hierarchical org - pick the structure that fits the problem, not the model.

Any local model

Auto-discovers Ollama and LM Studio, then lets you map any model to any reasoning step.

Streaming transparency

Every plan, critique and delegation streams in real time as collapsible thought blocks - the model's work is never a black box.

01 · Sequential

Thinking Pipeline

Force a small model through the same discipline a senior engineer applies: plan the work, execute the plan, then critique the output and rewrite the final answer. Each stage runs with its own tuned system prompt and can use a different model.

  • Off
    Direct mode Single-shot inference. The pipeline gets out of the way for short, factual prompts.
  • Medium
    Plan → Execute → Polish Light planning and a polishing pass. Best for everyday tasks where structure helps but speed matters.
  • Hard
    Plan → Execute → Critique Full three-stage reasoning with a strict adversarial critique. The critique stage audits the execution for errors and rewrites the final answer.
Thinking Pipeline: plan, execute, critique, answer Plan qwen3.5:2b Execute qwen3.5:0.8b Critique qwen3.5:2b Answer DECOMPOSE IMPLEMENT AUDIT & REWRITE Each stage can use a different model, with its own system prompt.
Plan → Execute → Critique. Different models can own different steps.
02 · Parallel

Agent Council

Get independent expert opinions in parallel, then have a Lead Judge synthesise them into one reasoned answer. Council members can be different models - mix a coding-tuned 2B with a general 3B and a reasoning-focused 4B to get genuinely diverse perspectives.

  • Advisors
    N independent experts Each advisor sees the user prompt fresh and produces a structured answer in isolation - no chain-of-thought contamination.
  • Judge
    Lead synthesizer Reads every advisor's output, resolves conflicts, removes duplicates, and writes the final answer.
  • Stream
    Live timeline Each advisor's reasoning streams into its own collapsible block so you can compare perspectives side-by-side.
Agent Council: user prompt fans out to advisors, Lead Judge synthesises the answer Prompt Advisor A Advisor B Advisor C Advisor D Lead Judge Answer
Advisors deliberate in parallel; the Lead Judge decides.
03 · Hierarchical

Organisation Mode

For problems too big for one prompt, spin up a virtual company. A CEO decomposes the request into tickets, department heads delegate to specialist employees, employees execute, and results propagate up the hierarchy for synthesis.

  • CEO
    Decomposition & synthesis Breaks the prompt into department-sized work items and, at the end, integrates every result into one cohesive answer.
  • Dept
    Delegation Department heads split their ticket into employee-sized subtasks and route each to the right specialist role.
  • Staff
    Execution Employees deliver focused, scoped work. Every ticket is tracked with status, assignee, and goal ancestry.
Organisation Mode: CEO delegates to departments, which delegate to employees CEO Research Engineering Writing Analyst Scout Dev QA Drafter Editor
A ticket system tracks every delegation with goal ancestry and status.
Under the hood

Boring tech. Intentionally.

No framework-of-the-week, no build step, no telemetry SDK. FastAPI streams NDJSON to vanilla JS. Easy to read, easy to fork, easy to trust.

FastAPI + async Python

All reasoning runs through async generators, so tokens stream the instant the model emits them.

NDJSON over SSE

Each step label, token, and final answer is a single JSON line - trivial to proxy, log, or replay.

Zero-dep frontend

One HTML file, one stylesheet, one JS module. No bundler, no package-lock to audit.

Auto-discovery

On startup we probe Ollama and LM Studio, so a fresh install usually just works.

Repetition guard

The client watches for repeated suffixes mid-stream and stops generation early - vital on small models.

In-memory ticket system

Org Mode tracks every delegation with assignees, parent goals, and status - no database to run.

Local-first

Your data never moves.

MultiMind runs entirely on your machine against your local model server. There is no MultiMind server, no telemetry, no background sync, no "optional" analytics.

  • No cloud API calls
  • No API keys to manage
  • No telemetry, ever
  • Works fully offline
Measured, not marketed

Structured reasoning lifts small models.

MultiMind ships its own benchmark harness alongside lm-evaluation-harness. These numbers are from an ongoing run on consumer hardware - not cherry-picked marketing.

Thinking modes · MacBook Pro M4
Benchmarks in progress
Benchmark results chart for MultiMind reasoning modes
Get started

Running in under two minutes.

MultiMind assumes you already have a local model runner. If you don't, grab Ollama - it's a one-click install on macOS, Windows and Linux.

  1. Install a local runner

    Download Ollama or LM Studio, then pull at least one model (for example, qwen3.5:0.8b).

    qwen3.5:0.8b
  2. Install MultiMind

    Install from PyPI. Python 3.10+ is the only prerequisite; there is no JavaScript build step.

    pip install multimind
  3. Launch the app

    Start the server and open the UI. MultiMind will auto-detect your local runner and preselect the first available model.

    multimind

Stop waiting on cloud APIs.

Install once, own the full reasoning stack on your own machine. MultiMind stays out of your way - and out of the network.