mango — a Rust-native distributed key-value store

§ 01

Etcd's problem space, attacked with Rust.

Mango is not "etcd, rewritten." It is etcd's problem space attacked with a language whose primitives lift specific etcd footguns out of existence at compile time.

Etcd is the reference implementation we study. We are not bound by its Go-isms. No GC means tail latency is bounded by what we do in our own code, never by a runtime collector. Memory safety without a runtime means use-after-free, double-free, and data races on shared memory are not possible in safe Rust — CVEs in this class become impossible-by-construction, not impossible-after-careful-review.

Fearless concurrency via Send and Sync turns "shared mutable state across threads" from a Friday-night page into a compile error. Explicit failure as a value via Result<T, E> makes every fallible operation visible at the call site. Cargo-native supply chain hygiene via cargo-deny, cargo-audit, cargo-vet, and SBOM via CycloneDX.

These are the mechanism. The ten bars below are the measurement. Every PR is judged against them. If a change merely matches etcd, that is a regression relative to the goal — find the win, or find the lever.

§ 02

Ten measurable axes.

Each bar has a comparison oracle (the pinned etcd v3.5.x binary at benches/oracles/etcd/), a hardware signature, and a named test that gates merge. Full bar definitions in the roadmap.

01

Performance — blazing fast.

Rust's no-GC plus zero-cost abstractions are the structural lever. We target ≥ 1.5× etcd write throughput, ≤ 0.7× p99 latency, ≤ 0.7× idle RSS, ≤ 0.7× cold start, ≤ 0.7× failover.
benches/runner/{raft,grpc,idle-rss,cold-start,failover}.sh
02

Concurrency & parallelism.

Per-core scaling is workload-shaped. Read-only ≥ 14× at 16 cores. Mixed ≥ 8×. Write-heavy ≥ 4× (apply is fundamentally serial in Raft). Zero deadlocks under fuzzed concurrent loads.
per-core-scaling.sh · loom/* · clippy::await_holding_lock
03

Reliability.

Graceful degradation. No thundering herds on follower restart. Bounded recovery time. Disk-full enters read-only, raises NOSPACE, never crashes, never corrupts.
tests/reliability/*.rs · tests/chaos/*.rs
04

Correctness — distributed-systems grade.

Linearizability is the load-bearing claim and we verify it externally. Public Jepsen run in CI. Deterministic simulator on madsim. Porcupine linearizability checker on every recorded history.
tests/jepsen · tests/simulator · tests/linearizability
05

Safety — memory-safe by construction.

unsafe_code = "forbid" workspace-wide except in audited, named modules with documented invariants and Miri tests. No panics in steady state — denied by clippy in non-test code.
unsafe_code = "forbid" · MIRIFLAGS=-Zmiri-strict-provenance
06

Security — defense in depth.

Memory safety is necessary but not sufficient. SHA-pinned actions, cargo-deny, cargo-audit, cargo-vet, CycloneDX SBOM. Threat model formalized in Phase 12.
deny.toml · supply-chain/audits.toml · cyclonedx
07

Large-scale distributed.

Tier 2 read-scale-out via learner replicas. Up to ~5-10× etcd on linearizable ReadIndex; up to ~2× on bounded-staleness reads. 5-voter + 5-learner cluster on canonical hardware.
phase 14.5 · benches/runner/read-scale-out.sh
08

Operability.

Production-grade defaults. Predictable behavior at scale. OpenTelemetry-native observability with structured logs, OTel traces, richer metrics than etcd ships.
otel · structured logs · runtime tunables documented
09

Developer ergonomics.

Mango should be pleasant to contribute to. Fast CI. Small contribution surface. Expert-gated PR review on every change. cargo nextest + loom + madsim + miri all in scope.
cargo nextest · CONTRIBUTING.md · expert-gated PRs
10

Storage efficiency.

Smaller on-disk footprint than bbolt-based etcd. Faster compaction with bounded read-p99 impact during compaction (≤ 1.5× steady-state). LZ4/zstd block compression configurable.
benches/runner/disk-size.sh · benches/results/phase-1/

§ 03

Is mango the right tool for me?

Distributed KV stores are not interchangeable. Pick the one whose consistency model and scale ceiling match the problem you have.

	Mango	etcd	FoundationDB	DynamoDB
Consistency	Linearizable	Linearizable	Strict serializable	Eventual; strong opt-in (2× cost)
Replication	Raft, single cluster	Raft, single cluster	Multi-version, multi-shard	Hash-sharded, multi-region async
Writes / cluster	≥ 1.5× etcd target · bar #1	~50-200K /sec	~10M /sec (mixed)	~10-100M /sec global (mixed)
Linearizable reads	~600K /sec (Tier 2b) target	~50-150K /sec (ReadIndex)	(see above)	Strong-reads-only mode (2× cost)
Stale reads	~1M /sec (Tier 2a) target	~500K-1M /sec (serializable)	(see above)	Default mode
Deployment	Self-host, OSS	Self-host, OSS	Self-host, OSS	AWS-only, hosted
Primary use case	Cluster metadata, coordination, config, leader election	Same as mango	Application data, ACID at scale	Application data CRUD at hyperscale
Operational profile	Single-binary, deterministic latency (no GC)	Single-binary, Go GC	Multi-process; coordinators, storage, log	Fully managed

§ 04

Shipped, in flight, planned.

The roadmap progresses phase by phase, expert-gated PR by expert-gated PR. ROADMAP.md is the source of truth; everything below is a snapshot.

Shipped

Phase 0Governance, CI gates, lints, supply-chain tooling.
Phase 0.5Foundation tooling: nextest, loom, madsim, miri, semver-checks, cargo-deny.
Phase 1Single-node storage layer on redb plus tikv/raft-engine, behind a swappable Backend trait.
Phase 2MVCC: Revision, sharded KeyIndex, snapshots via arc_swap, compaction with physical removal, fuzz target on key encoding.

Planned

Phase 3Watch: streaming change notifications, sync & unsync watcher groups, progress notifies.
Phase 4Lease: TTL keys, keep-alive, expiry-driven atomic delete.
Phase 5Raft consensus on tikv/raft-rs. Pipelined replication, ReadIndex, deterministic simulation.
Phase 6gRPC server: KV, Watch, Lease services + the node binary.
Phase 7mangoctl, the etcdctl-equivalent CLI.
Phase 13Robustness: public Jepsen run, deterministic simulator regression suite.
Phase 14Performance push against the pinned etcd v3.5.x oracle.
Phase 14.5Tier 2 read-scale-out via learner replicas.

Etcd's problem space, attacked with Rust.

Ten measurable axes.

Performance — blazing fast.

Concurrency & parallelism.

Reliability.

Correctness — distributed-systems grade.

Safety — memory-safe by construction.

Security — defense in depth.

Large-scale distributed.

Operability.

Developer ergonomics.

Storage efficiency.

Is mango the right tool for me?

Shipped, in flight, planned.

Shipped

Planned