Quickstart
Installation
test command
Running SnelDB
The easiest way to get hands-on is the embedded Playground.
- Start SnelDB (dev config enables the Playground by default):
server.http_addr = "127.0.0.1:8085"
[playground] enabled = true
- Open
http://127.0.0.1:8085/
in your browser. - Type commands like:
DEFINE subscription FIELDS { "id": "int", "plan": "string" }
STORE subscription FOR ctx1 PAYLOAD {"id":1,"plan":"free"}
STORE subscription FOR ctx2 PAYLOAD {"id":2,"plan":"pro"}
QUERY subscription WHERE id=1
Notes
- The UI posts raw command lines to
POST /command
(no JSON API required). - Set
server.output_format
totext
(terminal-like) orjson
(toggle pretty in UI). - To disable the Playground, set
[playground] enabled = false
.
A Gentle Guide for Engineers
SnelDB is built to be small and simple. It keeps track of what happened, in order, and makes it easy to get those facts back out quickly. That’s it. This guide will walk you through how to think about events, how to design them so they’re useful, and how to use SnelDB’s tiny set of commands—DEFINE
, STORE
, QUERY
, and REPLAY
. Along the way we’ll use a retail shop as an example, but the same ideas apply in many domains.
Quick start: the easiest way to try SnelDB is to open the embedded Playground at
http://127.0.0.1:8085/
(enabled by default in dev). Type commands directly and see results immediately.
Why events?
An event is just a record that something happened: an order was created, a customer signed up, a parcel was delivered. Events don’t change once they’re stored. By keeping them all, you get a trustworthy history. Your application can look back, replay them, and figure out the current state whenever it needs. SnelDB focuses on storing these events and letting you fetch them again quickly. The “what do these events mean?” part stays in your application.
Two ways of reading
With SnelDB, there are really only two ways you read:
-
Replay a timeline for one thing. All the events for a single
context_id
(like an order, a customer, or a device) form a story. If youREPLAY FOR order-9001
, you’ll see every event for that order in sequence. Your code can fold those into the current state. -
Query across many things. Sometimes you don’t want the whole story of one order, you want a slice across all orders. For that, you use
QUERY
. For example:QUERY order_created WHERE status="submitted"
. Behind the scenes, SnelDB uses tricks like enum bitmaps and filters to make those queries quick, so you don’t have to think about indexes.
If you remember one thing: replay for one thing’s story, query for slices across many things.
Choosing a context
So what is this context_id
? Think of it as “whose story am I telling?” For a retail system:
- An order has a start and an end, so it makes sense to use
order-<id>
as the context. - Inventory belongs to a SKU, so
sku-<code>
is a context. - A customer profile belongs to a customer, so
customer-<id>
works.
When you want to be able to say “show me everything that ever happened to X”, that X should be a context.
Designing an event
Name events the way you’d explain them to a teammate: order_created
, customer_registered
, shipment_delivered
. Keep the payload small and clear. Always include:
- The IDs you’ll need to filter by later (
order_id
,customer_id
,sku
). - Enums for fixed sets of values. For example:
"plan": ["basic", "pro", "enterprise"]
- A timestamp for when it happened.
Here are a few examples:
DEFINE customer_registered FIELDS {
"customer_id":"string",
"email":"string",
"plan":["basic","pro","enterprise"],
"created_at":"timestamp"
}
DEFINE order_created FIELDS {
"order_id":"string",
"customer_id":"string",
"status":["pending","submitted","cancelled"],
"created_at":"timestamp"
}
DEFINE shipment_delivered FIELDS {
"shipment_id":"string",
"order_id":"string",
"carrier":["UPS","DHL","FedEx"],
"delivered_at":"timestamp"
}
Storing events
The very first need is to record facts: something happened, and you want to keep it. Writing an event in SnelDB is just that—adding a new fact to the timeline.
STORE customer_registered FOR customer-123
PAYLOAD {"customer_id":"123","email":"a@b.com","plan":"pro"}
STORE order_created FOR order-9001
PAYLOAD {"order_id":"9001","customer_id":"123","status":"pending"}
STORE shipment_delivered FOR ship-5001
PAYLOAD {"shipment_id":"5001","order_id":"9001","carrier":"UPS"}
Later on, when dealing with retries or external systems, you might add optional fields like idempotency_key
. But the heart of storing events is simply: write down the fact.
Reading events
If you want to know the current state of one thing, replay its story:
REPLAY FOR order-9001
If you want to know which events match a condition across many things, query:
QUERY order_created WHERE customer_id="123"
If you need to follow a chain—like from an order to its shipment—query by the keys you included in the payload:
QUERY shipment_delivered WHERE order_id="9001"
How to evolve
SnelDB is built on immutability. Once an event is stored it never changes. If the shape of an event needs to change, we don’t edit old events or add fields onto them. Instead, we create a new version of the schema or define a new event type that represents the new shape.
Older events remain valid and replayable; newer ones follow the updated schema. This way, every event clearly shows which version of the schema it follows, and your code can handle old and new versions side by side. Immutability guarantees that history is stable, while evolution ensures you can keep writing new chapters without breaking the old ones.
Scaling without extra knobs
You don’t manage indexes or query planners. You simply design your events with the right fields. SnelDB takes care of compression and filtering internally. If a query feels heavy, ask yourself: did I include the right key in the payload?
Streaming
If you need near‑real‑time processing, you don’t need a new command. Just poll with SINCE
on your timestamp:
QUERY order_created WHERE created_at >= "2025-09-07T00:00:00Z" LIMIT 1000
Keep track of the last event you saw in your application and continue from there.
Other domains
- Billing: replay a subscription’s events to learn its current plan; query invoices or payments by
customer_id
. - IoT: replay one device’s events to see its config; query telemetry since last night.
- Logistics: replay one parcel’s journey; query all parcels delivered today.
What SnelDB won’t do
SnelDB will never enforce your workflows, run aggregates, or decide who is allowed to see data. Those belong in your application or other tools. SnelDB’s job is narrower: keep facts safe, and give them back quickly.
A closing picture
Think of two simple moves:
- Down: replay the whole story for one thing.
- Across: query slices across many things.
Nearly everything you need can be done by combining these two moves. The database is small on purpose. If you design your events carefully, SnelDB will give you speed and reliability without ever getting in your way.
SnelDB Design Philosophy
Hassle-free by design
SnelDB is small on purpose. You don’t need to learn dozens of commands, fiddle with query planners, or manage indexes. Four simple verbs—DEFINE
, STORE
, QUERY
, REPLAY
—cover almost everything you need. Less to remember, less to break.
Immutability at the core
Facts don’t change once written. Immutability makes your history reliable and auditable. If things evolve, you add new event types or new schema versions. Old events remain intact; new ones live alongside them.
Evolution over correction
Rather than patching or rewriting, you let the story grow. Each new event is another page in the log. That makes timelines honest, reproducible, and easy to debug.
Performance without knobs
SnelDB is built for performance, but you don’t need to manage any of it. Internally, it uses shards to spread load, an LSM-tree design to keep writes fast, and columnar storage with enum bitmaps and XOR filters to make queries efficient. You never have to tune these parts yourself—they just work in the background so you can focus on your application.
Universal patterns
Two simple movements cover most use cases:
- Replay one context’s timeline to rebuild its state.
- Query across many contexts with filters.
This model is the same whether you’re preparing order data in retail, collecting device signals in IoT, managing subscriptions in SaaS, or feeding clean event streams into data and AI/ML teams for training and analysis.
Staying in its lane
SnelDB doesn’t do business logic, aggregations, or access control. Those belong in your services and tools. The database’s job is to keep track of everything faithfully and give it back quickly.
Introduction
SnelDB is a lightweight, high‑performance database for immutable events. You append facts, then filter or replay them—quickly and reliably.
What it is
- Store: append events with a type, context_id, timestamp, and payload
- Query: filter by event type, context, time, and conditions
- Replay: stream events for a context in original order
DEFINE payment FIELDS {"amount":"int","status":"string"}
STORE payment FOR user-123 PAYLOAD {"amount":250,"status":"verified"}
QUERY payment WHERE status="verified"
REPLAY FOR user-123
Why it exists
General-purpose databases and queues struggle with large, evolving event logs. SnelDB is built for:
- Immutable, append-only data
- Fast filtering at scale (columnar + pruning)
- Ordered replay per context
- Minimal ops overhead
Key features
- Append-only storage (perfect audit trails; predictable recovery)
- Simple, human‑readable commands (JSON‑native)
- Fast queries at scale (shards, zones, compaction)
- Replay built in (time‑travel debugging, sequence modeling)
- Flexible schemas (strict validation; optional fields)
- Lightweight & safe (Rust; embeddable; no GC)
Who it’s for
- Product analytics and auditing
- ML pipelines on event sequences
- Operational debugging and timeline reconstruction
Why SnelDB?
Most databases were never built for events.
They're optimized for records that change: customer profiles, inventory counts, order statuses. But in the real world, especially in modern systems and data pipelines, we’re dealing more and more with things that happened — not things that are.
- A user signed up.
- A sensor pinged.
- A document was approved.
- A model prediction was stored.
These aren’t updates. They’re facts. Immutable. Time-stamped. Contextual.
The gap
If you’ve tried to build on top of these kinds of events, you've probably run into one of these:
- Slow queries over millions of records because you're using a general-purpose SQL database
- Too much ceremony, it’s painful to rebuild a timeline of actions (what happened, when, and in what order)
- Custom tooling just to read back historical behavior
- Mixing logs and storage (Kafka for ingest, S3 for storage, Athena for queries… and duct tape in between)
- Hard to filter, trace, or correlate things once the data grows
And if you work in AI or data science, you’ve probably dealt with brittle pipelines, long joins, and the question:
“How do I get all the events for this user/session/date range — and trust the output?”
The idea
SnelDB was born to make event-driven storage and retrieval feel natural — for developers, data engineers, and model builders alike.
It’s a database designed from scratch for:
- Immutable, append-only data
- High-throughput ingest
- Fast filtering and replay
- Event-type-aware columnar storage
- Schema evolution without migrations
- Minimal operational overhead
You store events. You query them. You replay them. That’s it. It does the rest — segmenting, zoning, indexing, compaction — in the background.
Why not just use X?
- Kafka? Great for streaming, not for historical querying.
- PostgreSQL? Fantastic RDBMS, but not built for multi-billion-row event logs.
- Snowflake? Powerful, but heavy and expensive for interactive filtering.
- ClickHouse? Blazing fast, but not optimized for replay semantics and evolving schemas.
SnelDB is a sweet spot: light like SQLite, fast like ClickHouse, event-native like Kafka — but simple to reason about.
Built for builders
Whether you’re:
- Building product analytics dashboards from raw event logs
- Tracking user behavior over time, across sessions or contexts
- Training machine learning models on real-world event sequences
- Auditing critical flows or investigating anomalies
- Archiving time-stamped data for compliance or reporting
- Creating time-travel debugging tools or operational replay systems
SnelDB gives you a clean, reliable foundation to work with immutable facts — fast to store, easy to query, and simple to reason about.
Simple to embed. Easy to query. Scales with clarity.
That’s why we built SnelDB.
Stories from the field
To see why SnelDB exists, it helps to look at a few real situations where traditional tools fall short.
-
Product analytics at scale A growing SaaS company wants to track how users move through their app. At first, PostgreSQL is fine. But soon the tables balloon into billions of rows. Queries slow to a crawl, analysts create brittle pipelines, and nobody fully trusts the numbers. With SnelDB, they could store clean, immutable event streams, filter them quickly by context, and build dashboards that actually stay fast as volume grows.
-
Machine learning pipelines A data science team trains fraud detection models using transaction histories. They struggle to rebuild consistent training sets: data is scattered across Kafka topics, S3 buckets, and ad-hoc SQL queries. With SnelDB, they can reliably fetch “all sequences of events leading to flagged outcomes,” ensuring reproducibility and shortening the path from raw logs to usable training data.
-
Auditing in regulated industries A fintech startup needs to prove to auditors what happened, when, and by whom. Traditional databases allow updates and deletes, which introduces doubt. SnelDB’s append-only design guarantees that past events remain untouched, making it straightforward to demonstrate compliance with minimal operational effort.
-
Operational debugging An infrastructure engineer gets paged at 2am for a production outage. Logs are rotated, metrics are sampled, and the picture is incomplete. With SnelDB, they can replay the exact sequence of system events leading up to the failure, reconstruct the timeline, and pinpoint the root cause without guesswork.
What is SnelDB?
SnelDB is a lightweight, high-performance database designed for immutable events.
At its core, it’s a system where you can:
- Store events in an append-only fashion
- Query them efficiently by type, context, or time
- Replay them in order to understand what happened
That’s it. No updates. No deletes. Just fast, reliable access to a growing stream of facts.
Not your average database
SnelDB is not a general-purpose relational database, a message broker, or a data lake. It’s a specialized tool focused on event-driven data:
- Unlike a relational database, SnelDB doesn’t model changing rows. It treats data as a log of things that happened.
- Unlike a message queue, it’s built for storage and querying, not just delivery.
- Unlike a data warehouse, it’s lightweight and easy to embed in everyday applications.
Think of it as a database that embraces time and immutability as first-class concepts.
A mental model
The easiest way to think about SnelDB is:
- A notebook for your system’s history: every line is a fact, recorded once, never erased.
- A timeline you can slice and filter: events are grouped by type, context, and time, so you can quickly zoom in.
- A replay button: if you need to reconstruct a past sequence, you can ask SnelDB to play it back in order.
A simple example
Imagine you’re building a payments system.
You might store events like:
{ "event_type": "payment_initiated", "context_id": "user_123", "payload" : { "amount": 100 }, "timestamp": "2025-08-20T09:30:00Z" }
{ "event_type": "payment_verified", "context_id": "user_123", "payload" : { "amount": 100 }, "timestamp": "2025-08-20T09:31:00Z" }
{ "event_type": "payment_settled", "context_id": "user_123", "payload" : { "amount": 100 }, "timestamp": "2025-08-20T09:35:00Z" }
Later, you might want to:
- Fetch all payment_initiated events from last week
- Replay all events for
user_123
in order - Filter for verified payments over
$500
And maybe even more:
- Compare the average settlement time for all payments last month
- Find all users who initiated a payment but never settled
- Retrieve the full sequence of events for a disputed transaction
- Generate a distribution of payment amounts across different countries
- Train a model using all past transactions, keeping the exact order of events intact
In a traditional setup, you’d stitch together logs, SQL queries, and custom scripts. With SnelDB, these queries are first-class citizens. For example:
QUERY payment_initiated SINCE 2025-08-01
or:
REPLAY FOR user_123
even like:
QUERY payment_verified WHERE amount > 500
Instead of thinking in terms of tables and joins, you think in terms of events. SnelDB is designed so the way you ask matches the way you think: “What happened? When? For whom?”
Key Features
SnelDB is small in surface area but powerful in practice. Here are the highlights that make it different:
1. Append-only storage
Events are immutable. Once stored, they’re never updated or deleted — which means:
- Perfect audit trails
- Predictable replay of past behavior
- No risk of hidden mutations breaking analysis
2. Simple, human-readable commands
No SQL boilerplate. No obscure APIs. SnelDB has a compact command language that reads like plain English:
DEFINE payment FIELDS { "amount": "int", "status": "string" }
STORE payment FOR user-123 PAYLOAD {"amount": 250, "status":"verified"}
QUERY payment WHERE status="verified"
REPLAY FOR user-123
Fast to learn. Easy to remember. Case-insensitive and JSON-native.
3. Fast queries at scale
Under the hood, SnelDB uses an LSM-tree design with:
- Shards for parallelism
- Zones and filters to skip irrelevant data
- Compaction to keep reads efficient over time
The result: queries stay snappy whether you have thousands or billions of events.
4. Replay built in
You don’t just query — you can replay events in order:
REPLAY order_created FOR customer-42
This makes debugging, time-travel analysis, and sequence modeling natural parts of the workflow.
5. Flexible schemas
SnelDB supports schema definitions per event type, with:
- Strict validation: payloads must match fields
- Optional fields: declared as string | null
- Clear errors when something doesn’t line up
This keeps data trustworthy without slowing you down.
6. Designed for AI & analytics
Because events are ordered, immutable, and replayable, SnelDB is a natural fit for:
- Training models on real-world sequences
- Feeding pipelines with reproducible datasets
- Analyzing behavior over time without complex joins
- Auditing decision processes with confidence
7. Lightweight & embeddable
SnelDB is written in Rust with minimal dependencies. It runs anywhere — from a laptop dev setup to production servers — without heavyweight orchestration.
You can drop it into your stack as a focused, reliable event database.
Built-in Playground
SnelDB ships with a tiny single-page Playground so you can try commands without any client code.
Start the server and open http://127.0.0.1:8085/
to experiment with DEFINE
, STORE
, QUERY
, and REPLAY
instantly.
8. Safety by design
SnelDB is built in Rust, which brings memory safety, thread safety, and performance without garbage collection.
This means:
- No segfaults or memory leaks corrupting your data
- Concurrency without data races
- Predictable performance, even under load
When you’re storing critical events, safety is not optional — and Rust helps guarantee it from the ground up.
In short: SnelDB is designed to be small but sharp — a tool that does one thing well: make working with immutable events simple, fast, and reliable.
Commands
SnelDB has a compact, human-friendly command language. Keywords are case-insensitive (store
, STORE
, StOrE
all work). Event type names and context IDs are case-preserving.
Core verbs:
DEFINE
— declare a schema for an event typeSTORE
— append a new event with a JSON payloadQUERY
— filter eventsREPLAY
— stream events in original order (per context, optionally per type)FLUSH
— force a memtable → segment flushPING
— health check
If a command returns no rows, you’ll see: No matching events found
.
See pages below for full syntax and examples.
Syntax & Operators
Command forms
DEFINE <event_type> FIELDS { "key": "type", … }
STORE <event_type> FOR <context_id> PAYLOAD <json_object>
QUERY <event_type> [FOR <context_id>] [SINCE ] [WHERE ] [LIMIT ]
REPLAY [<event_type>] FOR <context_id> [SINCE ]
FLUSH
- Keywords: case-insensitive.
- Literals:
- Strings: double-quoted (
"NL"
,"a string"
). - Numbers: unquoted (
42
,3
,900
). - Booleans:
true
,false
(unquoted).
- Strings: double-quoted (
- WHERE operators:
=
,!=
,>
,>=
,<
,<=
,AND
,OR
,NOT
. - Precedence:
NOT
>AND
>OR
. Use parentheses sparingly by structuring conditions; (parentheses not required in current grammar). - LIMIT: positive integer; caps returned rows.
- SINCE: ISO-8601 timestamp string (e.g.,
2025-08-01T00:00:00Z
). When present, only events at/after this instant are considered.
Mini-grammar (informal)
expr := cmp | NOT expr | expr AND expr | expr OR expr
cmp :=
op := = | != | > | >= | < | <=
value := string | number | boolean
Examples
DEFINE order_created AS 1 FIELDS {
id: "uuid",
amount: "float",
currency: "string"
}
STORE order_created FOR ctx_123 PAYLOAD {
"id": "a1-b2",
"amount": 42.5,
"currency": "EUR",
"tags": ["new","vip"],
"flag": true
}
QUERY order_created FOR "ctx_123" SINCE "2025-08-01T00:00:00Z"
WHERE amount >= 40 AND currency = "EUR"
LIMIT 100
REPLAY order_created FOR ctx_123 SINCE "2025-08-01T00:00:00Z"
DEFINE
Purpose
Register the schema for an event type. STORE
payloads must conform to this schema.
Form
DEFINE <event_type:WORD> [ AS <version:NUMBER> ] FIELDS { "key_1": "type_1", ... }
Field pairs
- Keys can be STRING or WORD. The parser will quote WORD keys when converting to JSON.
- Values (types) can be:
- STRING literals, for example: "int", "string", "string | null"
- ARRAY of strings to define an enum, for example: ["pro", "basic"]
- Enum variants are case-sensitive ("Pro" != "pro")
- Schema must be flat (no nested objects).
Examples
DEFINE order_created FIELDS { "order_id": "int", "status": "string" }
DEFINE review FIELDS { rating: "int", verified: "bool" }
DEFINE order_created AS 2 FIELDS { order_id: "int", status: "string", note: "string | null" }
DEFINE subscription FIELDS { plan: ["pro", "basic"] }
Typical validation errors raised during STORE
- No schema defined
- Missing field
status
in payload - Field
order_id
is expected to be one ofint
, but gotString
- Payload contains fields not defined in schema: invalid_field
Store
Purpose
Append an event for a specific context.
Form
STORE <event_type:WORD> FOR <context_id:WORD or STRING> PAYLOAD {"key":"value", ...}
Constraints
<context_id>
can be a WORD (example: user-1) or a quoted STRING.PAYLOAD
must be a flat JSON object (no nested objects).PAYLOAD
must follow schema defined usingDEFINE
command.
Examples
STORE order_created FOR customer-1 PAYLOAD {"order_id":123,"status":"confirmed"}
STORE review FOR "user:ext:42" PAYLOAD {"rating":5,"verified":true}
STORE login FOR user-7 PAYLOAD {"device":"android"}
Behavior
- Validates payload against the schema of the event type.
- Rejects missing or extra fields and type mismatches.
- Durability-first: once acknowledged, the event will survive crashes.
Errors
<event_type>
cannot be empty<context_id>
cannot be empty- Schema validation errors (see
DEFINE
) - Overload/backpressure (rare): Shard is busy, try again later
QUERY
Purpose
Filter events by type, optionally by context, time, predicate, and limit.
Form
QUERY <event_type:WORD>
[ FOR <context_id:WORD or STRING> ]
[ SINCE <timestamp:STRING> ]
[ RETURN [ <field:WORD or STRING>, ... ] ]
[ WHERE <expr> ]
[ LIMIT <n:NUMBER> ]
Examples
QUERY order_created WHERE status="confirmed"
QUERY order_created WHERE status=confirmed
QUERY order_created WHERE id > 13 AND id < 15
QUERY order_created WHERE country!="NL"
QUERY order_created WHERE country="NL" OR country="FR"
QUERY login FOR user-1 WHERE device="android"
QUERY payment SINCE "2025-08-01T00:00:00Z" WHERE amount >= 500 LIMIT 100
QUERY product RETURN [name, "price"] WHERE price > 10
Notes
SINCE
is a STRING timestamp.RETURN [ ... ]
limits the payload fields included in results. Omit to return all payload fields. An empty listRETURN []
also returns all payload fields.- Field names in
RETURN
can be bare words or quoted strings. - Works across in-memory and on-disk segments.
- If nothing matches, returns: No matching events found.
Gotchas
- Field names used in
WHERE
must exist in the schema for that event type. - Strings must be double-quoted when you need explicit string literals.
- Unknown fields in
RETURN
are ignored; only schema-defined payload fields (plus core fieldscontext_id
,event_type
,timestamp
) are returned.
REPLAY
Purpose
Stream events back in their original append order for a context, optionally restricted to one event type.
Form
REPLAY [ <event_type:WORD> ]
FOR <context_id:WORD or STRING>
[ SINCE <timestamp:STRING> ]
[ RETURN [ <field:WORD or STRING>, ... ] ]
Variants
- All event types:
REPLAY FOR <context_id>
- Only specific event types:
REPLAY <event_type> FOR <context_id>
Examples
REPLAY FOR alice
REPLAY order_shipped FOR customer-99
REPLAY FOR "user:ext:42" SINCE "2025-08-20T09:00:00Z"
REPLAY product FOR user-1 RETURN ["name"]
Behavior
- Routes to the shard owning the context ID.
- Preserves original order.
- If nothing matches: No matching events found.
RETURN [ ... ]
limits payload fields in the replayed events. Omit or useRETURN []
to include all payload fields. Unknown fields are ignored; core fields (context_id
,event_type
,timestamp
) are always present.
Flush
Purpose
Force a memtable flush into an immutable segment.
Form
FLUSH
Notes
Useful for tests, checkpoints, or when you want on-disk segments immediately. Not required for correctness; ingestion continues during flush.
Design
Overview
What this section is
- A short tour of how SnelDB works inside: the big components and how data flows between them.
- Enough context for contributors to find their bearings without reading the whole codebase first.
The big picture
- Commands enter via frontends (TCP/UNIX/HTTP) and are parsed, validated, and dispatched.
- Writes go through a WAL for durability, land in an in-memory table, and get flushed into immutable segments on disk.
- Reads (query/replay) scan the in-memory table and segments, skipping as much as possible using zone metadata and filters.
- Background compaction keeps segments tidy so read performance stays predictable.
- Sharding by
context_id
spreads work and makes per-context replay cheap.
Lifecycle at a glance
DEFINE
: register or update the schema for an event type (used to validate STORE).STORE
: validate payload → append to WAL → apply to MemTable → later flush to a new segment.QUERY
: fan out to shards, prune zones and project only needed columns, evaluate predicates, merge results.REPLAY
: route to the shard for the context_id, stream events in original append order (optionally narrowed by event type).FLUSH
: force a MemTable flush to produce a new immutable segment (useful in tests/checkpoints).
What runs where
- Commands and flow control: command/parser, command/dispatcher, command/handlers.
- Storage engine: engine/core/* for WAL, memory, segments, zones, filters; engine/store, engine/query, engine/replay.
- Sharding and concurrency: engine/shard/* (manager, worker, messages).
- Background work: engine/compactor/* for segment merging and cleanup.
- Wiring and I/O: frontend/_ listeners; shared/_ for config, responses, logging.
Key guarantees (high level)
- Durability once a
STORE
is acknowledged (WAL first). - Immutability of events and on-disk segments (compaction replaces whole files, never edits in place).
- Ordered replay per
context_id
. - Schema-validated payloads (strict by default, optional fields via union types).
- Bounded memory via shard-local backpressure.
What this section doesn’t do
- It won’t dive into file formats or algorithmic details; those live in the focused pages that follow.
- It won’t prescribe ops/production practices; see the development/operations parts of the book.
How to use this section
Skim this page, then jump to the piece you’re touching:
- Changing parsing or adding a command
- Touching durability/flush/segment files
- Threading, channels, and routing
- Anything read-path related
- Background merging/policies
- Config, responses, logging, tests
That’s the map. Next pages keep the same tone and size: just enough to guide you to the right code.
Core concepts
- Event: time-stamped, immutable fact with a typed payload
- Event type & schema: defined via DEFINE, validates payload shape
- Context: groups related events under a context_id
- Shard: independent pipeline — WAL → MemTable → Flush → Segments
- WAL: per-shard durability log; replayed on startup
- MemTable: in-memory buffer; flushed when full
- Segment: immutable on-disk unit with columns, zones, filters, indexes
- Zone: fixed-size block inside a segment with pruning metadata
- Compaction: merges small segments to keep reads predictable
Storage Engine
Overview
The storage engine turns incoming events into durable, immutable data you can query quickly. It’s built around append-only writes, in-memory buffering, and on-disk segments that are efficient to scan and easy to skip.
Core Components
- WAL (write-ahead log): Per-shard durability log. Every accepted event is appended here first.
- MemTable: In-memory buffer for recent events. Fast inserts; swapped out when full.
- Flush worker: Converts a full MemTable into an immutable on-disk segment in the background.
- Segments: On-disk building blocks (columns, zone metadata, filters, lightweight indexes).
- Snapshots: Optional utility files (
.snp
events,.smt
metadata) for export/replay and range bookkeeping. - Compactor (covered later): Merges small segments into larger ones to keep reads predictable.
Write Path (At a Glance)
- Validate payload against the event type schema.
- Append to the WAL (durability point).
- Apply to the MemTable (fast in-memory structure).
- When the MemTable hits a threshold, swap it out and enqueue a background flush.
- Flush worker writes a new segment and publishes it atomically.
See the diagram below:
Write Path (In Depth)
0) Validate the event
- What: Check the incoming payload against the registered schema for its
event_type
. - Why: Ensures only well-formed data enters the system so downstream files and indexes remain consistent.
Example:
{
"timestamp": 1700000000,
"event_type": "signup",
"context_id": "user-42",
"payload": { "plan": "pro", "country": "US" }
}
Equivalent command:
STORE signup FOR user-42 PAYLOAD {"plan":"pro","country":"US"}
Validation ensures required fields exist and types are correct (for example, the event_type
is known and a "plan" is provided in the payload).
1) Append to the WAL (durability point)
- What: Append the validated event to the per-shard Write-Ahead Log (WAL).
- Why: Once the append returns, the event will survive a crash. On restart, the system replays WAL entries to rebuild in-memory state and complete any interrupted flushes.
- Notes:
- WAL records are lightweight, line-oriented appends (JSON-serialized per line).
- WAL files rotate in sync with the MemTable flush threshold (
engine.flush_threshold
), so replay windows are bounded by flush points. After a successful flush, older WAL files up to that cutoff can be pruned. - Behavior is tunable via config:
[wal] enabled, dir, buffered, buffer_size, flush_each_write, fsync, fsync_every_n
and[engine] flush_threshold
.
Crash safety example:
- If the process crashes after the WAL append but before the event hits memory, recovery will re-insert it into the MemTable on startup.
2) Insert into the MemTable (fast in-memory apply)
- What: Place the event into the in-memory, append-friendly, queryable buffer (MemTable).
- Why: Absorb writes in memory to batch them into large, sequential segment writes (avoids random I/O), maintain backpressure with bounded memory, and maximize ingest throughput. As a secondary benefit, new events are immediately visible to queries.
- Behavior:
- The MemTable is sized by
flush_threshold
(config). When it reaches capacity, it triggers a swap and a background flush. - Inserts are grouped by context so the flusher can scan them quickly.
- The MemTable is sized by
Small example:
flush_threshold = 4
- Incoming events (in order): A, B, C, D, E
- A, B, C, D go into the active MemTable. After D, the MemTable is full.
- A background flush is enqueued for these four; a fresh MemTable becomes active.
- E enters the new MemTable immediately (no blocking on the background flush).
3) Swap and enqueue a background flush
- What: When the active MemTable is full, it’s atomically swapped for a fresh, empty one, and the full snapshot is queued for flushing.
- Why: Writers remain responsive (no long I/O in the foreground) and the system maintains bounded memory.
- Details:
- The passive MemTable (now immutable) is handed off to the flush worker.
- Writes proceed into the newly created active MemTable.
4) Flush worker writes a new immutable segment
- What: The background worker turns the passive MemTable into an on-disk segment directory (for example,
segment-00042/
). - Inside the segment:
- Column files: One file per field, optimized for sequential appends and later memory-mapped (mmap) access. Naming:
<uid>_<field>.col
. Example:u01_timestamp.col
,u01_event_type.col
,u01_context_id.col
,u01_plan.col
,u01_country.col
. Where<uid>
is defiened per event type. - Zone metadata: Per-zone min/max timestamps, row ranges, and presence stats for pruning.
- Filters: Compact structures for pre-read pruning:
- XOR:
<uid>_<field>.xf
(approximate membership) - Enum Bitmap (EBM):
<uid>_<field>.ebm
(eq/neq for enums) - Zone SuRF:
<uid>_<field>.zsrf
(succinct range filter for>
,>=
,<
,<=
)
- XOR:
- Offsets/Index: Per-zone compressed offsets (
.zfc
files) describing compressed block ranges and in-block offsets.
- Column files: One file per field, optimized for sequential appends and later memory-mapped (mmap) access. Naming:
-
- Snapshots (optional):
-
- Event Snapshots (
.snp
): portable arrays of events with a binary header + length‑prefixed JSON entries.
- Event Snapshots (
-
- Snapshot Metadata (
.smt
): arrays of{uid, context_id, from_ts, to_ts}
entries with a binary header + length‑prefixed JSON.
- Snapshot Metadata (
- Publication: Segment creation is atomic at the directory level; once complete, readers can discover and scan it.
See the diagram below:
Sizing example:
flush_threshold = 32_768
events_per_zone = 2_048
- A full flush of 32,768 events creates exactly 16 zones. Each zone has its own metadata and contributes field values to the filter files. Larger
events_per_zone
values reduce metadata overhead but offer coarser pruning; smaller values increase pruning precision at the cost of more metadata.
5) Cleanup and WAL compaction
- What: After a successful flush, the system can prune or rotate old WAL files up to a cutoff corresponding to flushed data.
- Why: Keeps recovery time short and disk usage bounded.
End-to-end write example
- Client sends
STORE signup ...
with a valid payload. - The engine validates the event against the
signup
schema. - The event is appended to the WAL for shard 3 (durability).
- The event is inserted into shard 3’s active MemTable.
- When the MemTable reaches
flush_threshold
, it is swapped and the old one is queued for the background flush. - The flush worker writes
segment-00137/
with column files, 16 zones (if 32,768/2,048), zone metadata, XOR filters, Zone SuRF filters, and offsets/index. - Once published, queries immediately see the segment alongside any newer in-memory events.
- The WAL up to (and including) the flushed range is now safe to compact or rotate.
Failure model (write path)
- Crash before WAL append: The event is lost (not acknowledged).
- Crash after WAL append but before MemTable insert: The event is recovered from the WAL and re-applied on startup.
- Crash after MemTable insert but before flush: The event is not yet in a segment, but it is durable in the WAL. On restart, WAL replay restores it to the MemTable; if a swap occurred and a passive MemTable existed, its contents are reconstructed from WAL as well. No data loss; no duplicate segments.
- Crash during flush: The WAL still contains the flushed events; on restart, the system replays or completes the flush. Partially written segments are ignored until a valid, fully published segment is present.
Tuning the write path
- shards: More shards increase parallelism of WAL, MemTable, and flush pipelines (at the cost of more intense CPU and RAM usage and more files and directories).
flush_threshold
: Controls MemTable size. Higher values reduce flush frequency (bigger segments) but increase peak memory and WAL replay cost.events_per_zone
: Smaller values improve pruning for reads but increase metadata and filter counts. Pick based on query selectivity and typical field cardinalities.
Durability & Recovery
- Covered in the write path: WAL append is the durability point; replay restores MemTables; WAL rotation keeps recovery bounded. See Failure model above.
Backpressure & Safety
- Bounded channels between components provide backpressure under load (writers slow down instead of exhausting memory).
- Async workers (flush and compaction) are throttled so foreground writes and reads stay responsive.
This is the spine of the engine: durable append, fast memory, immutable segments with rich metadata, and just enough background work to keep reads snappy as data grows.
Read-time Projection & Column Pruning
- The query planner derives a minimal column set to load based on:
- Core fields:
context_id
,event_type
,timestamp
(always loaded) - Filter fields used in
WHERE
- Requested payload fields from
RETURN [ ... ]
(if provided)
- Core fields:
- If
RETURN
is omitted or empty (RETURN []
), all payload fields are considered eligible. - Unknown fields in
RETURN
are ignored (schema-driven). - Only the selected columns are mmap’d and read; others are skipped entirely, reducing I/O and memory.
- Projection decisions are logged under the
query::projection
target for debugging.
Query and Replay
Overview
SnelDB reads come in two flavors:
QUERY
: filter one event type by predicates, time, and optionalcontext_id
; may span shards.REPLAY
: stream all events for onecontext_id
(optionally one type) in original append order; single shard.
Both use the same internals as the write path: in‑memory MemTable, on‑disk immutable segments, per‑segment zones, and compact per‑field filters.
When to Use Which
- Use
QUERY
for analytics, debugging slices, and ad‑hoc filters across many contexts. - Use
REPLAY
to rebuild state or audit the exact sequence for one context.
Examples
-
QUERY
- Investigate: "All
order_created
over $100 in the last 24h across all users" - Dashboard: "Errors by type this week"
- Debug: "Sessions with
status = 'pending'
andretries > 3
"
- Investigate: "All
-
REPLAY
- Operational debugging (incident timeline)
REPLAY system_event FOR host-123 SINCE "2024-05-01T00:00:00Z"
- Auditing/compliance (full account trail)
REPLAY FOR account-42 SINCE "2024-01-01T00:00:00Z"
- ML pipelines (rebuild a customer’s transaction sequence)
REPLAY transaction FOR user-456 SINCE "2023-01-01T00:00:00Z"
- Product journey (single user or session in order)
REPLAY FOR user-123
- Operational debugging (incident timeline)
Command Cheatsheet
QUERY <event_type> [FOR <context_id>] [SINCE <ts>] [WHERE <expr>] [LIMIT <n>]
REPLAY [<event_type>] FOR <context_id> [SINCE <ts>]
More examples: Query and Replay
How It Works
QUERY (step‑by‑step)
- Parse and validate inputs.
- Plan shard tasks (fan‑out unless narrowed by
context_id
). - Per shard, scan MemTable and pick relevant segments.
- Prune zones by time and per‑field filters; read only needed columns.
- Range predicates (
>
,>=
,<
,<=
) are pruned using Zone SuRF ({uid}_{field}.zsrf
) when present, falling back to XOR/EBM only if unavailable. SuRF is an order‑preserving trie using succinct arrays for fast range overlap checks.
- Range predicates (
- Evaluate predicates and apply
WHERE
condition. - Merge shard results; apply global
LIMIT
if set.
REPLAY (step‑by‑step)
- Parse and validate inputs.
- Route to the shard owning the
context_id
. - Scan MemTable and relevant segments for that context.
- Apply optional
event_type
andSINCE
filters. - Stream events in original append order.
See the diagram:
What You Get
- Visibility: fresh writes are visible from
MemTable
before flush. - Ordering:
REPLAY
preserves append order (single shard).QUERY
has no global ordering unless you explicitly sort at merge (costly) or scope the query narrowly. - LIMIT (
QUERY
): short‑circuit per shard when possible; always cap globally during merge.
Performance Tips
- Prune early: favor
event_type
,context_id
, andSINCE
to skip zones fast. - Shard wisely: more shards increase scan parallelism but cost more on fan‑out.
Tuning
events_per_zone
: smaller zones = better pruning, more metadata; larger zones = fewer skips, less metadata.flush_threshold
: affects how much is in memory vs on disk, and segment cadence.- Shard count: match to CPU and expected concurrency.
Invariants
- Immutability: events and segments are never edited in place.
- Single‑shard replay: each
context_id
maps to exactly one shard. - Schema validity: stored payloads conform to their event type schema.
- Atomic publication: new segments become visible all‑or‑nothing.
Further Reading
SnelDB’s read path is simple to reason about: prune aggressively, read only what you need, and merge efficiently—whether you’re slicing across many contexts or replaying one.
Sharding
What it is
Sharding is how SnelDB scales ingestion and keeps per-context replay efficient. Instead of one big pipeline, the system runs multiple shard workers side by side. Each context_id
is deterministically mapped to a shard, so all events for that context live together.
Core pieces
- Shard Manager — owns all shards and routes work to them by hashing
context_id
. - Shard (worker) — long‑lived task that owns a WAL, active/passive MemTables, a flush queue, and the shard’s segment list. Processes Store, Query, Replay, and Flush messages.
- Messages — typed messages delivered to each shard: Store, Query, Replay, Flush.
- Backpressure — each shard has a bounded mailbox; when it fills, senders wait. Hot shards slow down independently without affecting others.
How it works
-
Startup
- The manager creates
N
shards (configurable) and starts one worker per shard. - Each shard ensures its storage directories exist, recovers its MemTable from its WAL, loads existing segment IDs, and starts background services (flush, compaction).
- The manager creates
-
Store
- Hash
context_id
→ pick shard → send Store. - The shard appends to its WAL, updates the in‑memory MemTable, and, when the MemTable reaches its threshold, rotates it to a passive buffer and enqueues a flush.
- Hash
-
Query
- Broadcast to all shards. Each shard scans its in‑memory state and on‑disk segments and returns matches. Results are merged.
-
Replay
- Single‑shard. The manager routes to the shard that owns the
context_id
. The shard streams events in order for that context.
- Single‑shard. The manager routes to the shard that owns the
-
Flush
- Manual
Flush
is broadcast to all shards. Each shard rotates its active MemTable and enqueues a flush to create a new segment. - Automatic flush also occurs when a shard’s MemTable reaches its configured threshold during ingestion.
- Manual
Why this design
- Locality: all events for a
context_id
stay on one shard → fast, single‑shard replay. - Parallelism: shards work independently → ingestion and queries scale with cores.
- Isolation: hot shards apply backpressure locally without stalling the whole system.
- Simplicity: shards don’t coordinate directly; only query results are merged.
Invariants
- Same
context_id
→ always the same shard. - Within a shard, event order per
context_id
is preserved. - Shards never share mutable state; cross‑shard communication happens via message passing and result merging.
Operational notes
- Number of shards controls parallelism; increase to utilize more CPU cores.
- Flush threshold tunes memory usage vs. write amplification; lower values flush more often.
- On startup, shards recover from their WALs before serving traffic; compaction runs in the background to control segment growth.
Further Reading
- A deep dive into WAL or flush internals (see Storage Engine).
- Query planning details (see Query & Replay).
- Compaction policies (see Compaction).
Sharding is the concurrency backbone: it divides the work, keeps replay cheap, and prevents overload by applying backpressure shard by shard.
Infrastructure
SnelDB isn’t just a storage engine — it needs the scaffolding around it to feel safe, predictable, and easy to integrate. That’s what the infrastructure layer provides.
Configuration
Every system needs a single source of truth for its settings. SnelDB loads a configuration once at startup and makes it available everywhere. This means:
- Consistency — all components (server, engine, WAL, logging) read from the same snapshot.
- Flexibility — settings can be changed through a config file or environment variable without recompiling.
- Safety — startup fails fast if something critical is missing or invalid.
Think of it as the contract between how you run SnelDB and how the engine behaves.
Logging
Logs are the “black box recorder” of SnelDB. They serve two purposes:
- For operators: real-time feedback in the console (levels like info/debug/warn).
- For long-term visibility: structured logs rotated daily on disk.
The philosophy is simple: logs should be human-readable, lightweight, and always available when you need to explain “what just happened.”
Responses
Every command produces a response. SnelDB keeps them minimal and predictable:
- A clear status code (OK, BadRequest, NotFound, InternalError).
- A message for humans.
- A body that can be either lines (for CLI-like tools) or structured JSON arrays (for programmatic use).
Two renderers handle the output: one friendly for terminals, one clean for machines. This way, SnelDB speaks both languages without complicating the core.
Why it matters
These pieces aren’t “extra code” — they’re the glue that makes SnelDB usable in the real world:
- Configuration means you can run the same binary in development, staging, and production with confidence.
- Logging means you can trust the system to tell you what it’s doing, even when things go wrong.
- Responses mean every client, from shell scripts to dashboards, gets consistent feedback.
Together, they provide the operational safety net: when you store events, you know how to configure it, you see what’s happening, and you get a clear answer back.
Compaction
What it is
Compaction keeps reads predictable as data grows. Instead of editing files in place, SnelDB periodically merges small, freshly-flushed segments into larger, cleaner ones. This reduces file count, tightens zone metadata, and improves pruning—without touching the logical history of events.
Why it matters
- Fewer segments → fewer seeks and better cache behavior.
- Larger, well-formed zones → more “skip work” during queries.
- Stable tail latencies as ingestion continues day after day.
How it runs (big picture)
- One background task per shard.
- Wakes up at a configurable interval.
- Checks disk/IO pressure; if the system is busy, skips this round.
- Looks at the shard’s segment index to decide if compaction is needed.
- When needed, launches a compaction worker to perform the merge and publish new segments atomically.
Shard-local by design
Each shard compacts its own segments. This keeps the work isolated, prevents cross-shard coordination, and preserves the “all events for a context live together” property.
When it triggers
- On a fixed timer (compaction_interval).
- Only if the segment index reports that thresholds are met (e.g., too many L0s, size/age criteria).
- Skips entirely if IO pressure is high to avoid hurting foreground work.
Safety & correctness
- Segments are immutable; compaction writes new files and then swaps pointers in one step.
- If a run fails, nothing is partially applied; the old segments remain authoritative.
- Reads continue throughout—queries see either the old set or the new set, never a half state.
- Replay order and event immutability are unaffected.
Resource awareness
- The loop samples system state (disks/IO) before running.
- Under pressure, the compactor yields to ingestion and queries.
- This protects P99 read latencies and avoids “compaction storms.”
What the worker does (conceptually)
- Selects a compaction set (often recent small segments).
- Merges column files in order, rebuilding zones, filters, and indexes.
- Emits a new, leveled segment and updates the segment index.
- Schedules old segments for deletion/GC.
Operator knobs
- compaction_interval: how often to check.
- compaction_threshold: when the segment index should say “yes, compact.”
- sys_io_threshold (and related IO heuristics): how conservative to be under load.
- events_per_zone and flush_threshold: influence zone granularity and L0 creation rate (tune together).
Invariants
- No in-place mutation; only append/replace at the segment set level.
- Queries stay available and correct while compaction runs.
- Failures are contained to the background task; foreground paths remain healthy.
What this page is not
- A file-format spec or merge algorithm walkthrough.
- A policy recipe for every workload. The defaults aim for good general behavior; heavy write or read-mostly deployments may tune the thresholds differently.
Layering Strategy in SnelDB
This page gives a high-level view of how SnelDB is structured. It focuses on what each layer does and how requests flow through the system.
Layer 1: frontend
— Transport and Connections
- Listens for client connections (e.g., Unix/TCP/HTTP).
- Reads requests and writes responses.
- Hands off parsing and execution to the
command
andengine
layers.
Layer 2: command
— Parse and Dispatch
- Parses user input (e.g.,
DEFINE
,STORE
,QUERY
). - Validates and turns text into typed commands.
- Dispatches to the appropriate operation in the engine.
Layer 3: engine
— Core Logic
- Implements the main behaviors: define schemas, store events, run queries, replay, and flush.
- Chooses the right shard and updates on-disk data as needed.
- Stays independent from how clients connect or send requests.
Layer 4: shared
— Common Utilities
- Configuration and response types used across layers.
- Logging setup and other small shared helpers.
Flow Summary (STORE example)
- Frontend receives a request.
command
parses and validates it.- The dispatcher routes to the correct engine operation.
engine
executes and updates storage.- A response is returned to the client.
Why this layering?
- Clean separation: parsing, logic, and transport are independent.
- Easy to test: engine logic can be tested without real sockets.
- Scales well: clear boundaries support growth and optimization.
Threading and Async
What it is
- Networking is handled with async tasks (Tokio) for each client connection.
- Work is executed by per-shard worker tasks, communicated via message passing.
- This separates I/O from data processing and keeps shard state isolated.
Core pieces
- Frontends — Unix/TCP/HTTP listeners accept connections and spawn a task per client.
- Connection — reads lines, parses commands, and dispatches them for execution.
- Shard Manager — owns shards and routes work by hashing
context_id
. - Shard (worker) — long‑lived task that owns WAL, MemTables, flush queue, and segment list; handles Store, Query, Replay, Flush.
- Channels —
tokio::sync::mpsc
for sending typed messages to shards. - Schema Registry — shared via
Arc<tokio::sync::RwLock<SchemaRegistry>>
.
How it works
-
Startup
- Initialize the schema registry and shard manager.
- Bind a Unix listener and start accepting connections.
- Spawn background workers (flush, compaction) per shard.
-
Connection handling
- Spawn a task per client.
- Read lines, parse into commands, dispatch to the shard manager.
-
Store
- Route to shard by
context_id
. - Append to WAL, update active MemTable; rotate and enqueue flush when needed.
- Route to shard by
-
Query
- Broadcast to all shards.
- Each shard scans its in‑memory and on‑disk state and returns matches; results are merged.
-
Replay
- Route to the shard for the
context_id
. - Stream events in original append order for that context.
- Route to the shard for the
-
Flush
- Broadcast; shards rotate MemTables and enqueue flush to produce a new segment.
Why this design
- Async I/O: efficient, scalable handling of many connections.
- Shard workers: clear ownership and predictable performance.
- Separation of concerns: networking and storage logic don’t intermingle.
Invariants
- Frontends do not perform disk I/O or modify indexes directly.
- Shard workers own shard state; cross‑shard mutable sharing is avoided.
- Schema access uses async
RwLock
for safe concurrent reads/writes.
Operational notes
- Bounded shard mailboxes apply local backpressure; tune channel sizes as needed.
- Number of shards controls parallelism; size to match CPU/core availability.
- Monitor channel depth and lock contention to spot hotspots.
Further Reading
Logging
What it is
- SnelDB uses the
tracing
ecosystem for structured, leveled logs. - Logs are emitted to stdout and to a daily‑rotated file, with independent levels.
- Levels and output directory are configured via the config file.
Core pieces
- Initializer — sets up
tracing_subscriber
layers for stdout and file. - Config —
[logging]
section controlslog_dir
,stdout_level
, andfile_level
. - Levels —
error
,warn
,info
,debug
,trace
.
How it works
-
Startup
logging::init()
is called frommain.rs
before starting frontends.- Reads
CONFIG.logging
to build filters and writers. - Installs two layers: ANSI stdout and file appender (
sneldb.log
, daily rotation).
-
Emitting logs
- Use
tracing::{error!, warn!, info!, debug!, trace!}
in code. - Prefer spans (e.g.,
#[instrument]
) to capture context around operations.
- Use
Configuration
Example snippet from config.toml
:
[logging]
log_dir = "../data/logs"
stdout_level = "debug"
file_level = "error"
stdout_level
: global level for console logs.file_level
: global level for file logs.log_dir
: directory wheresneldb.log
is created (daily rotation).
Why this design
- Structured logs with levels and spans ease debugging and operations.
- Separate stdout/file control supports local development and production hygiene.
Operational notes
- Tune levels per environment (e.g.,
stdout_level=warn
in prod). - Ensure
log_dir
exists and is writable; it is created on first write by the appender. - Use targets when necessary to scope logs for noisy modules.
Further Reading
tracing
crate docstracing_subscriber
filters and formatters
File Formats and Data Layout
What it is
- The on-disk layout for shards and segments, and the binary formats used for columns, offsets, zone metadata, indexes, and schemas.
- These formats are append-friendly, read-optimized, and simple to parse with memory maps.
Core pieces
- Segments —
segment-xxxxx/
directories under each shard. - Columns —
{uid}_{field}.col
files storing values with length prefixes. - Zone Compressed Offsets —
{uid}_{field}.zfc
files listing per-zone compressed block metadata and in-block offsets. - Zone Metadata —
{uid}.zones
containing per-zone min/max timestamps and row ranges. - Zone Index —
{uid}.idx
mapping context_id values to zone ids. - XOR Filters —
{uid}_{field}.xf
per-field filters for fast membership tests. - Enum Bitmap Indexes —
{uid}_{field}.ebm
per-enum-field bitmaps for zone pruning. - Zone SuRF Filters —
{uid}_{field}.zsrf
per-field per-zone succinct range filters for range pruning. - Schemas —
schema/schemas.bin
append-only records of event type schemas and UIDs.
Binary headers
- All binary files now begin with a fixed, 20-byte header to improve safety and detect corruption.
- Header layout (little-endian):
- 8 bytes: MAGIC (ASCII tag identifying file kind)
- 2 bytes: VERSION (u16)
- 2 bytes: FLAGS (u16)
- 4 bytes: RESERVED (u32)
- 4 bytes: HEADER_CRC32 (u32) computed over MAGIC+VERSION+FLAGS+RESERVED
- WAL logs remain newline-delimited JSON without a binary header.
Magic strings per file kind:
- Columns (
.col
):EVDBCOL\0
- Zone Compressed Offsets (
.zfc
):EVDBZCF\0
- Zone Metadata (
.zones
):EVDBZON\0
- Zone Index (
.idx
per-UID/context):EVDBUID\0
- XOR Filters (
.xf
):EVDBXRF\0
- Zone SuRF Filters (
.zsrf
):EVDBZSF\0
- Shard Segment Index (
segments.idx
):EVDBSIX\0
- Schemas (
schemas.bin
):EVDBSCH\0
- Enum Bitmap Index (
.ebm
):EVDBEBM\0
- Event Snapshots (
.snp
):EVDBSNP\0
- Snapshot Metadata (
.smt
):EVDBSMT\0
Compatibility and migration:
- Readers tolerate legacy files that lack headers and continue to parse them.
- New writers always prepend the header.
- A future strict mode may enforce headers on read.
Directory layout
data/
├── cols/
│ ├── shard-0/
│ │ └── segment-00000/
│ │ ├── {uid}_{field}.col
│ │ ├── {uid}_{field}.zfc
│ │ ├── {uid}.zones
│ │ ├── {uid}.idx
│ │ ├── {uid}_{field}.xf
│ │ ├── {uid}_{field}.zsrf
│ │ └── {uid}_{field}.ebm
│ └── shard-1/
│ └── segment-00000/
├── logs/
│ └── sneldb.log.YYYY-MM-DD
└── schema/
└── schemas.bin
Snapshots are ad-hoc utility files and can be written anywhere (not tied to the segment layout). Typical usage writes them to a caller-provided path.
Column files: {uid}_{field}.col
- Format per value (binary):
- File begins with a binary header (MAGIC
EVDBCOL\0
). [u16]
little-endian length[bytes]
UTF‑8 string of the value
- File begins with a binary header (MAGIC
- Access pattern: memory-mapped and sliced using offsets.
Zone compressed offsets: {uid}_{field}.zfc
- Binary layout per zone (repeated):
- File begins with a binary header (MAGIC
EVDBZOF\0
). [u32] zone_id
[u32] count
number of offsets[u64] * count
byte offsets into the corresponding.col
- File begins with a binary header (MAGIC
- Purpose: enables loading only the rows for a given zone by first reading and decompressing the zone block, then slicing values using in-block offsets.
Zone metadata: {uid}.zones
- Bincode-encoded
Vec<ZoneMeta>
. - File begins with a binary header (MAGIC
EVDBZON\0
). - Fields:
zone_id: u32
uid: String
segment_id: u64
start_row: u32
end_row: u32
timestamp_min: u64
timestamp_max: u64
Zone index: {uid}.idx
- Binary map of
event_type -> context_id -> [zone_id...]
. - Used to quickly locate candidate zones by
context_id
. - Written via
ZoneIndex::write_to_path
and read withZoneIndex::load_from_path
. - File begins with a binary header (MAGIC
EVDBUID\0
).
XOR filters: {uid}_{field}.xf
- Bincode-serialized
BinaryFuse8
filter over unique field values. - Used for fast approximate membership checks during planning.
- File begins with a binary header (MAGIC
EVDBXRF\0
).
Zone SuRF filters: {uid}_{field}.zsrf
- Bincode-serialized
ZoneSurfFilter
containingVec<ZoneSurfEntry>
. - Purpose: zone-level range pruning for numeric, string, and boolean fields using a succinct trie.
- File begins with a binary header (MAGIC
EVDBZSF\0
). - Contents:
entries: Vec<ZoneSurfEntry>
where each entry is{ zone_id: u32, trie: SurfTrie }
.SurfTrie
stores compact arrays of degrees, child offsets, labels, and terminal flags.
- Built during flush/compaction by
ZoneWriter::write_all
. - Used by
ZoneFinder
forGt/Gte/Lt/Lte
operations before falling back to XOR/EBM. - Naming mirrors
.xf
/.ebm
: peruid
andfield
.
Enum bitmap index: {uid}_{field}.ebm
- Zone-level bitmaps per enum variant for fast Eq/Neq pruning.
- File begins with a binary header (MAGIC
EVDBEBM\0
). - Binary layout:
[u16] variant_count
- Repeated
variant_count
times:[u16] name_len
[bytes] variant_name (UTF‑8)
[u16] rows_per_zone
- Repeated per zone present in the file:
[u32] zone_id
[u16] variant_count_again
- Repeated
variant_count_again
times:[u32] bitmap_len_bytes
[bytes] packed_bitmap
(LSB-first within a byte; bit i set ⇒ row i has this variant)
- Usage: on a filter
plan = "pro"
, prune zones where thepro
bitmap is all zeros; similarly for!=
by checking any non-target variant has a bit set. - Observability: use
convertor ebm <segment_dir> <uid> <field>
to dump a JSON view of per-zone row positions per variant.
Schemas: schema/schemas.bin
- Append-only file of bincode-encoded
SchemaRecord
entries:uid: String
event_type: String
schema: MiniSchema
- Loaded at startup by
SchemaRegistry
. - File begins with a binary header (MAGIC
EVDBSCH\0
).
Shard segment index: segments.idx
- Bincode-encoded
Vec<SegmentEntry>
; file begins with a binary header (MAGICEVDBSIX\0
).
Why this design
- Immutable segments + append-only metadata simplify recovery and concurrency.
- Memory-mappable, length-prefixed encodings keep parsing simple and fast.
- Separate files per concern (values, offsets, metadata, indexes) enable targeted IO.
Operational notes
- Segment directories are named
segment-00000
,segment-00001
, ... - UIDs are per-event-type identifiers generated at DEFINE; filenames use
{uid}
not the event type. - New fields simply create new
.col/.zfc/.xf
files in subsequent segments.
Further Reading
Event snapshots: *.snp
- Purpose: portable bundles of events (potentially mixed types) for export, testing, or replay.
- File begins with a binary header (MAGIC
EVDBSNP\0
). - Binary layout after header:
[u32] num_events
- Repeated
num_events
times:[u32] len_bytes
[bytes] JSON-serialized Event
(same schema as API/Event struct)
- Notes:
- Events are serialized as JSON for compatibility (payloads can contain arbitrary JSON values).
- Readers stop gracefully on truncated data (warn and return successfully with the parsed prefix).
Snapshot metadata: *.smt
- Purpose: describes snapshot ranges per
(uid, context_id)
with min/max timestamps. - File begins with a binary header (MAGIC
EVDBSMT\0
). - Binary layout after header:
[u32] num_records
- Repeated
num_records
times:[u32] len_bytes
[bytes] JSON-serialized SnapshotMeta { uid, context_id, from_ts, to_ts }
- Notes:
- JSON is used for the same reasons as snapshots (arbitrary strings/IDs, forward-compat fields).
- Readers stop gracefully on truncated data (warn and return successfully with the parsed prefix).