mango · ground-up rust port of etcd · pre-alpha
document  ·  v0  ·  engineering brief

mango

A Rust-native distributed key-value store. Linearizable KV over Raft, watch streams, leases, MVCC. Built to beat etcd on ten measurable axes.

License  Apache-2.0 MSRV  stable Status  pre-alpha Replication  Raft Consistency  Linearizable
§ 01

Etcd's problem space, attacked with Rust.

Mango is not "etcd, rewritten." It is etcd's problem space attacked with a language whose primitives lift specific etcd footguns out of existence at compile time.

Etcd is the reference implementation we study. We are not bound by its Go-isms. No GC means tail latency is bounded by what we do in our own code, never by a runtime collector. Memory safety without a runtime means use-after-free, double-free, and data races on shared memory are not possible in safe Rust — CVEs in this class become impossible-by-construction, not impossible-after-careful-review.

Fearless concurrency via Send and Sync turns "shared mutable state across threads" from a Friday-night page into a compile error. Explicit failure as a value via Result<T, E> makes every fallible operation visible at the call site. Cargo-native supply chain hygiene via cargo-deny, cargo-audit, cargo-vet, and SBOM via CycloneDX.

These are the mechanism. The ten bars below are the measurement. Every PR is judged against them. If a change merely matches etcd, that is a regression relative to the goal — find the win, or find the lever.

§ 02

Ten measurable axes.

Each bar has a comparison oracle (the pinned etcd v3.5.x binary at benches/oracles/etcd/), a hardware signature, and a named test that gates merge. Full bar definitions in the roadmap.

  1. 01

    Performance — blazing fast.

    Rust's no-GC plus zero-cost abstractions are the structural lever. We target ≥ 1.5× etcd write throughput, ≤ 0.7× p99 latency, ≤ 0.7× idle RSS, ≤ 0.7× cold start, ≤ 0.7× failover.

    benches/runner/{raft,grpc,idle-rss,cold-start,failover}.sh
  2. 02

    Concurrency & parallelism.

    Per-core scaling is workload-shaped. Read-only ≥ 14× at 16 cores. Mixed ≥ 8×. Write-heavy ≥ 4× (apply is fundamentally serial in Raft). Zero deadlocks under fuzzed concurrent loads.

    per-core-scaling.sh · loom/* · clippy::await_holding_lock
  3. 03

    Reliability.

    Graceful degradation. No thundering herds on follower restart. Bounded recovery time. Disk-full enters read-only, raises NOSPACE, never crashes, never corrupts.

    tests/reliability/*.rs · tests/chaos/*.rs
  4. 04

    Correctness — distributed-systems grade.

    Linearizability is the load-bearing claim and we verify it externally. Public Jepsen run in CI. Deterministic simulator on madsim. Porcupine linearizability checker on every recorded history.

    tests/jepsen · tests/simulator · tests/linearizability
  5. 05

    Safety — memory-safe by construction.

    unsafe_code = "forbid" workspace-wide except in audited, named modules with documented invariants and Miri tests. No panics in steady state — denied by clippy in non-test code.

    unsafe_code = "forbid" · MIRIFLAGS=-Zmiri-strict-provenance
  6. 06

    Security — defense in depth.

    Memory safety is necessary but not sufficient. SHA-pinned actions, cargo-deny, cargo-audit, cargo-vet, CycloneDX SBOM. Threat model formalized in Phase 12.

    deny.toml · supply-chain/audits.toml · cyclonedx
  7. 07

    Large-scale distributed.

    Tier 2 read-scale-out via learner replicas. Up to ~5-10× etcd on linearizable ReadIndex; up to ~2× on bounded-staleness reads. 5-voter + 5-learner cluster on canonical hardware.

    phase 14.5 · benches/runner/read-scale-out.sh
  8. 08

    Operability.

    Production-grade defaults. Predictable behavior at scale. OpenTelemetry-native observability with structured logs, OTel traces, richer metrics than etcd ships.

    otel · structured logs · runtime tunables documented
  9. 09

    Developer ergonomics.

    Mango should be pleasant to contribute to. Fast CI. Small contribution surface. Expert-gated PR review on every change. cargo nextest + loom + madsim + miri all in scope.

    cargo nextest · CONTRIBUTING.md · expert-gated PRs
  10. 10

    Storage efficiency.

    Smaller on-disk footprint than bbolt-based etcd. Faster compaction with bounded read-p99 impact during compaction (≤ 1.5× steady-state). LZ4/zstd block compression configurable.

    benches/runner/disk-size.sh · benches/results/phase-1/
§ 03

Is mango the right tool for me?

Distributed KV stores are not interchangeable. Pick the one whose consistency model and scale ceiling match the problem you have.

Mango etcd FoundationDB DynamoDB
Consistency Linearizable Linearizable Strict serializable Eventual; strong opt-in (2× cost)
Replication Raft, single cluster Raft, single cluster Multi-version, multi-shard Hash-sharded, multi-region async
Writes / cluster ≥ 1.5× etcd target · bar #1 ~50-200K /sec ~10M /sec (mixed) ~10-100M /sec global (mixed)
Linearizable reads ~600K /sec (Tier 2b) target ~50-150K /sec (ReadIndex) (see above) Strong-reads-only mode (2× cost)
Stale reads ~1M /sec (Tier 2a) target ~500K-1M /sec (serializable) (see above) Default mode
Deployment Self-host, OSS Self-host, OSS Self-host, OSS AWS-only, hosted
Primary use case Cluster metadata, coordination, config, leader election Same as mango Application data, ACID at scale Application data CRUD at hyperscale
Operational profile Single-binary, deterministic latency (no GC) Single-binary, Go GC Multi-process; coordinators, storage, log Fully managed
§ 04

Shipped, in flight, planned.

The roadmap progresses phase by phase, expert-gated PR by expert-gated PR. ROADMAP.md is the source of truth; everything below is a snapshot.

Shipped

  • Phase 0Governance, CI gates, lints, supply-chain tooling.
  • Phase 0.5Foundation tooling: nextest, loom, madsim, miri, semver-checks, cargo-deny.
  • Phase 1Single-node storage layer on redb plus tikv/raft-engine, behind a swappable Backend trait.
  • Phase 2MVCC: Revision, sharded KeyIndex, snapshots via arc_swap, compaction with physical removal, fuzz target on key encoding.

Planned

  • Phase 3Watch: streaming change notifications, sync & unsync watcher groups, progress notifies.
  • Phase 4Lease: TTL keys, keep-alive, expiry-driven atomic delete.
  • Phase 5Raft consensus on tikv/raft-rs. Pipelined replication, ReadIndex, deterministic simulation.
  • Phase 6gRPC server: KV, Watch, Lease services + the node binary.
  • Phase 7mangoctl, the etcdctl-equivalent CLI.
  • Phase 13Robustness: public Jepsen run, deterministic simulator regression suite.
  • Phase 14Performance push against the pinned etcd v3.5.x oracle.
  • Phase 14.5Tier 2 read-scale-out via learner replicas.