Architecture
klite is a Kafka-compatible broker designed around simplicity: one Go binary, one process, zero external dependencies.
Design philosophy
Section titled “Design philosophy”Traditional Kafka requires multiple brokers, a controller quorum (KRaft or ZooKeeper), and careful operational tuning. klite takes a different approach:
- Single binary — Download and run. No JVM, no classpath, no cluster coordination.
- Single broker — No replication, no leader election, no inter-broker RPCs. Durability comes from S3, not replicas.
- WAL + S3 — Hot data lives in a local write-ahead log (WAL) on fast storage. Cold data is flushed to S3 for durability and cost efficiency.
- Pass-through bytes — klite never decodes individual records. RecordBatches are opaque byte slices: read header, assign offset, store, serve. No deserialization, no re-serialization.
Data flow
Section titled “Data flow” ┌─────────────┐ │ Clients │ │ (any Kafka │ │ library) │ └──────┬──────┘ │ Kafka wire protocol │ (TCP port 9092) ▼ ┌────────────────┐ │ klite │ │ │ │ ┌──────────┐ │ │ │ Protocol │ │ Produce/Fetch/Groups/Admin │ │ Handlers │ │ │ └────┬─────┘ │ │ │ │ │ ┌────▼─────┐ │ │ │ Cluster │ │ Offset assignment, topic metadata, │ │ State │ │ group coordination │ └────┬─────┘ │ │ │ │ │ ┌────▼─────┐ │ │ │ WAL │ │ Append-only log on local disk │ │ Writer │ │ Batched fsync (2ms window) │ └────┬─────┘ │ └───────┼────────┘ │ ┌──────────┼──────────┐ ▼ ▼ ┌─────────────┐ ┌─────────────┐ │ Local Disk │ │ S3 │ │ (NVMe) │ │ (durable) │ │ │ │ │ │ WAL files │ ───► │ Flushed │ │ metadata │ │ segments │ └─────────────┘ └─────────────┘Request lifecycle
Section titled “Request lifecycle”Produce
Section titled “Produce”- Client sends a Produce request with one or more RecordBatches
- klite validates the batch header (CRC, magic byte, size limits)
- The cluster goroutine assigns monotonically increasing offsets
- Batches are appended to the WAL
- Fsync is batched: writes accumulate over a configurable window (default 2ms), then a single fsync ACKs all waiters
- Response is sent with the assigned base offset
- Client sends a Fetch request with topic-partition-offset tuples
- klite reads from the WAL (hot data in OS page cache) or S3 (cold data)
- Long-polling: if no data is available, the request blocks until data arrives or the timeout expires
- Response streams RecordBatches directly — no per-record copying
Consumer groups
Section titled “Consumer groups”- Consumers join a group via JoinGroup
- The group leader receives member subscriptions and assigns partitions
- Assignments are distributed via SyncGroup
- Heartbeats maintain membership; missed heartbeats trigger rebalance
- Offsets are committed and tracked per group
Performance characteristics
Section titled “Performance characteristics”klite is designed to be bottlenecked by hardware, not software:
| Scenario | Target | Bottleneck |
|---|---|---|
| Produce (batched fsync, 2ms) | 200K-500K msgs/sec | SSD fsync latency |
| Produce (acks=1, no fsync) | 500K-1M msgs/sec | CPU (protocol codec) |
| Fetch (tail, page cache) | 1M+ msgs/sec | Memory bandwidth |
| Fetch (cold, SSD read) | ~2.7M msgs/sec | SSD sequential read |
| Produce latency P50 | <2ms | Fsync batch window |
| Produce latency P99 | <5ms | Fsync + scheduling |
Assumes 1KB messages, PCIe 4.0 NVMe (~3 GB/s sequential), 4+ cores.
Key design decisions for performance
Section titled “Key design decisions for performance”- Zero-allocation hot path — Produce and Fetch allocate near-zero per message. Buffer pools (
sync.Pool), pre-allocated slices, and in-place mutations minimize GC pressure. - Batched fsync — Never fsync per-request. Accumulate writes, fsync once, ACK all waiters.
- OS page cache as read cache — No application-level cache. The kernel manages caching via the page cache.
- Lock-free read path — Immutable WAL segments can be read concurrently without locks. Only writes go through the serialized cluster goroutine.
Concurrency model
Section titled “Concurrency model”klite uses a single cluster goroutine for all write operations:
- Offset assignment
- Topic/partition creation
- Consumer group state transitions
- WAL append
This eliminates locking on the write path. Read operations (Fetch, ListOffsets) are concurrent and lock-free against immutable WAL segments.
Each TCP connection runs in its own goroutine, handling request framing and response ordering. Requests that require cluster state are dispatched to the cluster goroutine via channels.
When to use klite
Section titled “When to use klite”klite is a good fit when:
- You need Kafka protocol compatibility but don’t need a full cluster
- You’re running in dev, staging, CI, or small production environments
- You want simple operations: one binary, one process, one config
- You want S3-backed durability without managing Kafka’s tiered storage
- You’re running on a single node with fast local storage
klite is not a fit when:
- You need multi-broker replication for HA (klite is single-broker)
- You need millions of messages/sec across many nodes
- You need Kafka Connect or server-side Kafka Streams
- You need ZooKeeper or KRaft controller quorum APIs
Comparison
Section titled “Comparison”| klite | Apache Kafka | Redpanda | WarpStream | |
|---|---|---|---|---|
| Architecture | Single binary | Multi-broker cluster | Single binary (C++) | Zero-disk, agents + S3 |
| Storage | WAL + S3 | Local log segments | Local log segments | S3 only |
| Replication | None (S3 durability) | ISR-based | Raft-based | S3 durability |
| Dependencies | None | ZooKeeper/KRaft | None | S3 + metadata store |
| Resource usage | ~20 MB idle | 1+ GB per broker | 1+ GB | Agent is stateless |
| Operations | Single process | Complex | Medium | Medium |
| Best for | Small/medium, dev, edge | Large-scale production | Medium/large production | Cloud-native, large-scale |
Next steps
Section titled “Next steps”- Storage — deep dive into WAL format, S3 flushing, and retention
- Compatibility — supported Kafka APIs and client libraries
- Configuration — tune klite for your workload