🌸 21st Century Counterculture

Run AI in a herd,
not a data center

Plug in whatever you already run — Ollama, vLLM, LM Studio, llama.cpp, Exo — and join a global network that runs any model on Hugging Face. Serve inference to earn priority when you consume. No credit card. No crypto. No cloud required.

Get started → ★ Star on GitHub

Free Global AI inference · 100% open source

$ git clone https://github.com/samtroberts/openhydra && cd openhydra && pip install -r requirements.txt

PyPI wheels coming soon; build from source for now. Needs Rust + a C toolchain — see the Quick Start for per-platform steps and Troubleshooting.

Why OpenHydra

The protocol for everyone’s local LLM

A peer-to-peer protocol for free, distributed AI inference — BitTorrent, for AI. Plug in what you have, use what the herd has.

🔌

Plug in what you already run

Ollama, vLLM, LM Studio, llama.cpp, Exo — any engine, any OS, any GPU. A thin adapter joins your stack to the network. Keep the tools you know; reach the whole herd.

🦙

Run models bigger than your machine

Requests route to whoever already has the model loaded — it runs there at full local speed. Models too big for any one node shard across peers, BitTorrent-style.

🤝

Free, by reciprocity

Give-to-get: serve inference to earn priority when you consume. No tokens, no crypto, no $20/month. Contribute when you’re idle, draw on the herd when you need it.

The five-year-old version

OK but what actually is this?

Glad you asked. It’s honestly not that complicated once you stop calling it “AI infrastructure”.

🎒

Big models are heavy

A 70-billion parameter model weighs ~140 GB. That’s not fitting on your laptop. But split across 8 laptops? Now we’re talking.

🌊

Swarm it, don’t hoard it

Like BitTorrent, everyone in the herd serves a shard. Your laptop handles one piece of the inference. Together the whole model runs.

🔍

And bad answers get caught

Responses are spot-checked and cross-verified across peers. Good llamas build reputation and get routed more work; cheaters get downranked until the swarm stops trusting them.

Small models like Qwen 3.5 0.8B run on a single laptop. Bigger ones like Qwen 3 72B need 8 peers. The default install gets you going with Qwen 3.5 immediately — no beefy GPU required.

How it works

Three steps. Zero guesswork.

We eliminated every barrier between you and the swarm. No VRAM calculations. No model selection anxiety. Clone, install, and you’re in — a one-click desktop app is on the way.

One command — you’re already in

Clone, install, run. That’s the hard part. OpenHydra auto-detects your hardware and immediately joins the base swarm running Qwen 3.5 0.8B — just 2 GB of RAM. You’re contributing to the global network. You didn’t need to know what a “parameter” is.

Get auto-promoted

Got a beefy Mac or GPU? OpenHydra notices and nudges you: “Your hardware can power the Frontier Swarm (27B).” Bump up your contribution. The network just got smarter because you showed up.

Serve while you idle

Your laptop serves AI tokens in the background. The swarm handles routing, verification, and privacy. You just leave it running. Nobody needed a data center.

What you get

Features, listed professionally

We also have a proper features page in the docs, but here’s the version where we’re allowed to be slightly smug.

⚡

Drop-in OpenAI & Ollama API

Change one URL. Your existing code works. /v1/chat/completions with SSE streaming, plus Ollama-compatible /api/chat for Open WebUI and Continue.dev.

🧠

KV cache compaction

4-phase Attention Matching keeps long conversations alive without nuking your VRAM. Based on arXiv:2602.16284 — we read the papers so you don’t have to.

🔗

Native libp2p networking

A Rust libp2p stack: Kademlia DHT discovery across three continents, QUIC + TCP, DCUtR hole-punching, and Circuit Relay v2 fallback. No central broker. If one bootstrap goes down, the llamas find another way.

🖥

Desktop app + Local Mode Coming Soon

A Tauri v2 desktop app for macOS and Windows is on the way: run offline in Local Mode (your own private LLM), or flip a toggle to join the global swarm — one-click switch, zero restart. For now, build from source.

🛡

Encrypted, authenticated transport

Ed25519 peer identities; every connection encrypted in transit via libp2p Noise (and QUIC). Optional per-hop AES-256-GCM adds a second layer. For full prompt privacy, run LAN-only or in sharded mode, where no single peer sees your whole prompt.

🌎

Python & TypeScript SDKs Coming Soon

Zero-dependency Python client. Browser-native TypeScript SDK. The internal SDK scaffolding exists — public release and docs are coming in v1.1.

Standing on the shoulders of giants

We didn’t invent this. We just added llamas.

OpenHydra builds directly on two brilliant ideas. We want to be upfront about our inspirations, because intellectual honesty is cool (and mandatory if you don’t want to get ratio’d on HackerNews).

Academic inspiration

🌸 Petals

“Run large language models at home, BitTorrent‑style.” Petals proved that volunteer compute can serve real LLM inference across the internet. We took that idea and bolted on a desktop app and a very strong llama motif.

petals.dev →

Protocol inspiration

🌊 BitTorrent

Since 2001, BitTorrent has proved you can distribute enormous files to billions of people without a central server. If it works for a band’s entire discography, it can work for Qwen 3.5 tokens. Same energy.

bittorrent.com →

🦎 Fun llama fact #1: Real llamas are pack animals because they share the load across the herd. The weakest llama doesn’t carry the whole tent. This is also the core architectural principle of OpenHydra.

🦎 Fun llama fact #2: A group of llamas is called a herd. OpenHydra’s network of peers is also called a herd. We are very consistent in our metaphors and proud of this.

🦎 Fun llama fact #3: The Hydra in Greek mythology had multiple heads — cut one off and two grow back. Our bootstrap nodes work the same way. (Please don’t cut our bootstrap nodes.)

🦎 Fun llama fact #4: Llamas can spit up to 10 feet when stressed. Our nodes politely return HTTP 503 instead. Both are valid responses to being overwhelmed.

What runs on it

It’s Qwen all the way down (mostly)

The default is Qwen 3.5 0.8B — tiny enough for any laptop. Larger models shard automatically across multiple peers. NF4 quantisation cuts VRAM by 4x. Add any HuggingFace model by editing models.catalog.json.

Default · 1 peer · 2 GB

Qwen 3.5 0.8B

Runs on a potato. The default.

Compact · 1 peer · 5 GB

Qwen 3.5 2B

Strong multilingual. Single peer.

Mid-range · 1 peer · 9 GB

Qwen 3.5 4B

Reasoning on a single peer.

Advanced · 2 peers · 18 GB

Qwen 3.5 9B

High-quality reasoning. int8 quantised.

Frontier · 4 peers · 16 GB/peer

Qwen 3.5 27B

int4 quantised. Bring your friends.

21 models in the default catalog (Qwen 2.5/3/3.5, Gemma 3/4, SmolLM2, TinyLLaMA — base + instruct variants). Add any HuggingFace model to models.catalog.json. If the requested model lacks peers, the coordinator gracefully degrades to the nearest available smaller model.

Run AI in a herd,
not a data center

Qwen 3.5 2B hits 17 tok/s on two T4 GPUs

Calling all Mac Mini “OpenClaw” buyers

The protocol for everyone’s local LLM

OK but what actually is this?

Three steps. Zero guesswork.

Features, listed professionally

We didn’t invent this. We just added llamas.

It’s Qwen all the way down (mostly)

Your laptop is a supercomputer waiting to happen.

Run AI in a herd,not a data center

Qwen 3.5 2B hits 17 tok/s on two T4 GPUs

Calling all Mac Mini “OpenClaw” buyers

The protocol for everyone’s local LLM

OK but what actually is this?

Three steps. Zero guesswork.

Features, listed professionally

We didn’t invent this. We just added llamas.

It’s Qwen all the way down (mostly)

Your laptop is a supercomputer waiting to happen.

Run AI in a herd,
not a data center