Local-first reasoning OS

Make small local models reason like big ones.

MultiMind orchestrates your local LLMs with three reasoning architectures - sequential pipelines, parallel expert councils, and full organisational hierarchies - so a 3B model on your laptop can match the depth of a frontier system.

Get started View on GitHub

Runs 100% locally
Open source (MIT)
No API keys, ever
Ollama & LM Studio ready

Screenshot of the MultiMind AI workspace showing streaming reasoning steps

Why MultiMind

Four hard guarantees, one wrapper.

Every design choice is in service of a single idea: give small, private models the structured scaffolding that frontier APIs get for free.

Private by default

Prompts, completions and critiques never leave your machine. No cloud round-trips, no telemetry, no accounts.

Three modes in one UI

Sequential pipeline, parallel council, or hierarchical org - pick the structure that fits the problem, not the model.

Any local model

Auto-discovers Ollama and LM Studio, then lets you map any model to any reasoning step.

Streaming transparency

Every plan, critique and delegation streams in real time as collapsible thought blocks - the model's work is never a black box.

01 · Sequential

Thinking Pipeline

Force a small model through the same discipline a senior engineer applies: plan the work, execute the plan, then critique the output and rewrite the final answer. Each stage runs with its own tuned system prompt and can use a different model.

Off
Direct mode Single-shot inference. The pipeline gets out of the way for short, factual prompts.
Medium
Plan → Execute → Polish Light planning and a polishing pass. Best for everyday tasks where structure helps but speed matters.
Hard
Plan → Execute → Critique Full three-stage reasoning with a strict adversarial critique. The critique stage audits the execution for errors and rewrites the final answer.

Plan → Execute → Critique. Different models can own different steps.

02 · Parallel

Agent Council

Get independent expert opinions in parallel, then have a Lead Judge synthesise them into one reasoned answer. Council members can be different models - mix a coding-tuned 2B with a general 3B and a reasoning-focused 4B to get genuinely diverse perspectives.

Advisors
N independent experts Each advisor sees the user prompt fresh and produces a structured answer in isolation - no chain-of-thought contamination.
Judge
Lead synthesizer Reads every advisor's output, resolves conflicts, removes duplicates, and writes the final answer.
Stream
Live timeline Each advisor's reasoning streams into its own collapsible block so you can compare perspectives side-by-side.

Advisors deliberate in parallel; the Lead Judge decides.

03 · Hierarchical

Organisation Mode

For problems too big for one prompt, spin up a virtual company. A CEO decomposes the request into tickets, department heads delegate to specialist employees, employees execute, and results propagate up the hierarchy for synthesis.

CEO
Decomposition & synthesis Breaks the prompt into department-sized work items and, at the end, integrates every result into one cohesive answer.
Dept
Delegation Department heads split their ticket into employee-sized subtasks and route each to the right specialist role.
Staff
Execution Employees deliver focused, scoped work. Every ticket is tracked with status, assignee, and goal ancestry.

A ticket system tracks every delegation with goal ancestry and status.

Under the hood

Boring tech. Intentionally.

No framework-of-the-week, no build step, no telemetry SDK. FastAPI streams NDJSON to vanilla JS. Easy to read, easy to fork, easy to trust.

FastAPI + async Python

All reasoning runs through async generators, so tokens stream the instant the model emits them.

NDJSON over SSE

Each step label, token, and final answer is a single JSON line - trivial to proxy, log, or replay.

Zero-dep frontend

One HTML file, one stylesheet, one JS module. No bundler, no package-lock to audit.

Auto-discovery

On startup we probe Ollama and LM Studio, so a fresh install usually just works.

Repetition guard

The client watches for repeated suffixes mid-stream and stops generation early - vital on small models.

In-memory ticket system

Org Mode tracks every delegation with assignees, parent goals, and status - no database to run.

Local-first

Your data never moves.

MultiMind runs entirely on your machine against your local model server. There is no MultiMind server, no telemetry, no background sync, no "optional" analytics.

No cloud API calls
No API keys to manage
No telemetry, ever
Works fully offline

Measured, not marketed

Structured reasoning lifts small models.

MultiMind ships its own benchmark harness alongside lm-evaluation-harness. These numbers are from an ongoing run on consumer hardware - not cherry-picked marketing.

Thinking modes · MacBook Pro M4

Benchmarks in progress

Benchmark results chart for MultiMind reasoning modes

Get started

Running in under two minutes.

MultiMind assumes you already have a local model runner. If you don't, grab Ollama - it's a one-click install on macOS, Windows and Linux.

Install a local runner

Download Ollama or LM Studio, then pull at least one model (for example, qwen3.5:0.8b).
qwen3.5:0.8b
Install MultiMind

Install from PyPI. Python 3.10+ is the only prerequisite; there is no JavaScript build step.
pip install multimind
Launch the app

Start the server and open the UI. MultiMind will auto-detect your local runner and preselect the first available model.
multimind

Stop waiting on cloud APIs.

Install once, own the full reasoning stack on your own machine. MultiMind stays out of your way - and out of the network.

Get started Star on GitHub