Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Quickstart

Installation

test command

Running SnelDB

The easiest way to get hands-on is the embedded Playground.

  • Start SnelDB (dev config enables the Playground by default):
    • server.http_addr = "127.0.0.1:8085"
    • [playground] enabled = true
  • Open http://127.0.0.1:8085/ in your browser.
  • Type commands like:
DEFINE subscription FIELDS { "id": "int", "plan": "string" }
STORE subscription FOR ctx1 PAYLOAD {"id":1,"plan":"free"}
STORE subscription FOR ctx2 PAYLOAD {"id":2,"plan":"pro"}
QUERY subscription WHERE id=1

Notes

  • The UI posts raw command lines to POST /command (no JSON API required).
  • Set server.output_format to text (terminal-like) or json (toggle pretty in UI).
  • To disable the Playground, set [playground] enabled = false.

A Gentle Guide for Engineers

SnelDB is built to be small and simple. It keeps track of what happened, in order, and makes it easy to get those facts back out quickly. That’s it. This guide will walk you through how to think about events, how to design them so they’re useful, and how to use SnelDB’s tiny set of commands—DEFINE, STORE, QUERY, and REPLAY. Along the way we’ll use a retail shop as an example, but the same ideas apply in many domains.

Quick start: the easiest way to try SnelDB is to open the embedded Playground at http://127.0.0.1:8085/ (enabled by default in dev). Type commands directly and see results immediately.

Why events?

An event is just a record that something happened: an order was created, a customer signed up, a parcel was delivered. Events don’t change once they’re stored. By keeping them all, you get a trustworthy history. Your application can look back, replay them, and figure out the current state whenever it needs. SnelDB focuses on storing these events and letting you fetch them again quickly. The “what do these events mean?” part stays in your application.

Two ways of reading

With SnelDB, there are really only two ways you read:

  1. Replay a timeline for one thing. All the events for a single context_id (like an order, a customer, or a device) form a story. If you REPLAY FOR order-9001, you’ll see every event for that order in sequence. Your code can fold those into the current state.

  2. Query across many things. Sometimes you don’t want the whole story of one order, you want a slice across all orders. For that, you use QUERY. For example: QUERY order_created WHERE status="submitted". Behind the scenes, SnelDB uses tricks like enum bitmaps and filters to make those queries quick, so you don’t have to think about indexes.

If you remember one thing: replay for one thing’s story, query for slices across many things.

Choosing a context

So what is this context_id? Think of it as “whose story am I telling?” For a retail system:

  • An order has a start and an end, so it makes sense to use order-<id> as the context.
  • Inventory belongs to a SKU, so sku-<code> is a context.
  • A customer profile belongs to a customer, so customer-<id> works.

When you want to be able to say “show me everything that ever happened to X”, that X should be a context.

Designing an event

Name events the way you’d explain them to a teammate: order_created, customer_registered, shipment_delivered. Keep the payload small and clear. Always include:

  • The IDs you’ll need to filter by later (order_id, customer_id, sku).
  • Enums for fixed sets of values. For example:
    "plan": ["basic", "pro", "enterprise"]
    
  • A timestamp for when it happened.

Here are a few examples:

DEFINE customer_registered FIELDS {
  "customer_id":"string",
  "email":"string",
  "plan":["basic","pro","enterprise"],
  "created_at":"timestamp"
}

DEFINE order_created FIELDS {
  "order_id":"string",
  "customer_id":"string",
  "status":["pending","submitted","cancelled"],
  "created_at":"timestamp"
}

DEFINE shipment_delivered FIELDS {
  "shipment_id":"string",
  "order_id":"string",
  "carrier":["UPS","DHL","FedEx"],
  "delivered_at":"timestamp"
}

Storing events

The very first need is to record facts: something happened, and you want to keep it. Writing an event in SnelDB is just that—adding a new fact to the timeline.

STORE customer_registered FOR customer-123
  PAYLOAD {"customer_id":"123","email":"a@b.com","plan":"pro"}

STORE order_created FOR order-9001
  PAYLOAD {"order_id":"9001","customer_id":"123","status":"pending"}

STORE shipment_delivered FOR ship-5001
  PAYLOAD {"shipment_id":"5001","order_id":"9001","carrier":"UPS"}

Later on, when dealing with retries or external systems, you might add optional fields like idempotency_key. But the heart of storing events is simply: write down the fact.

Reading events

If you want to know the current state of one thing, replay its story:

REPLAY FOR order-9001

If you want to know which events match a condition across many things, query:

QUERY order_created WHERE customer_id="123"

If you need to follow a chain—like from an order to its shipment—query by the keys you included in the payload:

QUERY shipment_delivered WHERE order_id="9001"

How to evolve

SnelDB is built on immutability. Once an event is stored it never changes. If the shape of an event needs to change, we don’t edit old events or add fields onto them. Instead, we create a new version of the schema or define a new event type that represents the new shape.

Older events remain valid and replayable; newer ones follow the updated schema. This way, every event clearly shows which version of the schema it follows, and your code can handle old and new versions side by side. Immutability guarantees that history is stable, while evolution ensures you can keep writing new chapters without breaking the old ones.

Scaling without extra knobs

You don’t manage indexes or query planners. You simply design your events with the right fields. SnelDB takes care of compression and filtering internally. If a query feels heavy, ask yourself: did I include the right key in the payload?

Streaming

If you need near‑real‑time processing, you don’t need a new command. Just poll with SINCE on your timestamp:

QUERY order_created WHERE created_at >= "2025-09-07T00:00:00Z" LIMIT 1000

Keep track of the last event you saw in your application and continue from there.

Other domains

  • Billing: replay a subscription’s events to learn its current plan; query invoices or payments by customer_id.
  • IoT: replay one device’s events to see its config; query telemetry since last night.
  • Logistics: replay one parcel’s journey; query all parcels delivered today.

What SnelDB won’t do

SnelDB will never enforce your workflows, run aggregates, or decide who is allowed to see data. Those belong in your application or other tools. SnelDB’s job is narrower: keep facts safe, and give them back quickly.

A closing picture

Think of two simple moves:

  • Down: replay the whole story for one thing.
  • Across: query slices across many things.

Nearly everything you need can be done by combining these two moves. The database is small on purpose. If you design your events carefully, SnelDB will give you speed and reliability without ever getting in your way.

SnelDB Design Philosophy

Hassle-free by design

SnelDB is small on purpose. You don’t need to learn dozens of commands, fiddle with query planners, or manage indexes. Four simple verbs—DEFINE, STORE, QUERY, REPLAY—cover almost everything you need. Less to remember, less to break.

Immutability at the core

Facts don’t change once written. Immutability makes your history reliable and auditable. If things evolve, you add new event types or new schema versions. Old events remain intact; new ones live alongside them.

Evolution over correction

Rather than patching or rewriting, you let the story grow. Each new event is another page in the log. That makes timelines honest, reproducible, and easy to debug.

Performance without knobs

SnelDB is built for performance, but you don’t need to manage any of it. Internally, it uses shards to spread load, an LSM-tree design to keep writes fast, and columnar storage with enum bitmaps and XOR filters to make queries efficient. You never have to tune these parts yourself—they just work in the background so you can focus on your application.

Universal patterns

Two simple movements cover most use cases:

  • Replay one context’s timeline to rebuild its state.
  • Query across many contexts with filters.

This model is the same whether you’re preparing order data in retail, collecting device signals in IoT, managing subscriptions in SaaS, or feeding clean event streams into data and AI/ML teams for training and analysis.

Staying in its lane

SnelDB doesn’t do business logic, aggregations, or access control. Those belong in your services and tools. The database’s job is to keep track of everything faithfully and give it back quickly.

Introduction

SnelDB is a lightweight, high‑performance database for immutable events. You append facts, then filter or replay them—quickly and reliably.

What it is

  • Store: append events with a type, context_id, timestamp, and payload
  • Query: filter by event type, context, time, and conditions
  • Replay: stream events for a context in original order
DEFINE payment FIELDS {"amount":"int","status":"string"}
STORE payment FOR user-123 PAYLOAD {"amount":250,"status":"verified"}
QUERY payment WHERE status="verified"
REPLAY FOR user-123

Why it exists

General-purpose databases and queues struggle with large, evolving event logs. SnelDB is built for:

  • Immutable, append-only data
  • Fast filtering at scale (columnar + pruning)
  • Ordered replay per context
  • Minimal ops overhead

Key features

  • Append-only storage (perfect audit trails; predictable recovery)
  • Simple, human‑readable commands (JSON‑native)
  • Fast queries at scale (shards, zones, compaction)
  • Replay built in (time‑travel debugging, sequence modeling)
  • Flexible schemas (strict validation; optional fields)
  • Lightweight & safe (Rust; embeddable; no GC)

Who it’s for

  • Product analytics and auditing
  • ML pipelines on event sequences
  • Operational debugging and timeline reconstruction

Why SnelDB?

Most databases were never built for events.

They're optimized for records that change: customer profiles, inventory counts, order statuses. But in the real world, especially in modern systems and data pipelines, we’re dealing more and more with things that happened — not things that are.

  • A user signed up.
  • A sensor pinged.
  • A document was approved.
  • A model prediction was stored.

These aren’t updates. They’re facts. Immutable. Time-stamped. Contextual.

The gap

If you’ve tried to build on top of these kinds of events, you've probably run into one of these:

  • Slow queries over millions of records because you're using a general-purpose SQL database
  • Too much ceremony, it’s painful to rebuild a timeline of actions (what happened, when, and in what order)
  • Custom tooling just to read back historical behavior
  • Mixing logs and storage (Kafka for ingest, S3 for storage, Athena for queries… and duct tape in between)
  • Hard to filter, trace, or correlate things once the data grows

And if you work in AI or data science, you’ve probably dealt with brittle pipelines, long joins, and the question:

“How do I get all the events for this user/session/date range — and trust the output?”

The idea

SnelDB was born to make event-driven storage and retrieval feel natural — for developers, data engineers, and model builders alike.

It’s a database designed from scratch for:

  • Immutable, append-only data
  • High-throughput ingest
  • Fast filtering and replay
  • Event-type-aware columnar storage
  • Schema evolution without migrations
  • Minimal operational overhead

You store events. You query them. You replay them. That’s it. It does the rest — segmenting, zoning, indexing, compaction — in the background.

Why not just use X?

  • Kafka? Great for streaming, not for historical querying.
  • PostgreSQL? Fantastic RDBMS, but not built for multi-billion-row event logs.
  • Snowflake? Powerful, but heavy and expensive for interactive filtering.
  • ClickHouse? Blazing fast, but not optimized for replay semantics and evolving schemas.

SnelDB is a sweet spot: light like SQLite, fast like ClickHouse, event-native like Kafka — but simple to reason about.

Built for builders

Whether you’re:

  • Building product analytics dashboards from raw event logs
  • Tracking user behavior over time, across sessions or contexts
  • Training machine learning models on real-world event sequences
  • Auditing critical flows or investigating anomalies
  • Archiving time-stamped data for compliance or reporting
  • Creating time-travel debugging tools or operational replay systems

SnelDB gives you a clean, reliable foundation to work with immutable facts — fast to store, easy to query, and simple to reason about.

Simple to embed. Easy to query. Scales with clarity.

That’s why we built SnelDB.

Stories from the field

To see why SnelDB exists, it helps to look at a few real situations where traditional tools fall short.

  • Product analytics at scale A growing SaaS company wants to track how users move through their app. At first, PostgreSQL is fine. But soon the tables balloon into billions of rows. Queries slow to a crawl, analysts create brittle pipelines, and nobody fully trusts the numbers. With SnelDB, they could store clean, immutable event streams, filter them quickly by context, and build dashboards that actually stay fast as volume grows.

  • Machine learning pipelines A data science team trains fraud detection models using transaction histories. They struggle to rebuild consistent training sets: data is scattered across Kafka topics, S3 buckets, and ad-hoc SQL queries. With SnelDB, they can reliably fetch “all sequences of events leading to flagged outcomes,” ensuring reproducibility and shortening the path from raw logs to usable training data.

  • Auditing in regulated industries A fintech startup needs to prove to auditors what happened, when, and by whom. Traditional databases allow updates and deletes, which introduces doubt. SnelDB’s append-only design guarantees that past events remain untouched, making it straightforward to demonstrate compliance with minimal operational effort.

  • Operational debugging An infrastructure engineer gets paged at 2am for a production outage. Logs are rotated, metrics are sampled, and the picture is incomplete. With SnelDB, they can replay the exact sequence of system events leading up to the failure, reconstruct the timeline, and pinpoint the root cause without guesswork.

What is SnelDB?

SnelDB is a lightweight, high-performance database designed for immutable events.

At its core, it’s a system where you can:

  • Store events in an append-only fashion
  • Query them efficiently by type, context, or time
  • Replay them in order to understand what happened

That’s it. No updates. No deletes. Just fast, reliable access to a growing stream of facts.

Not your average database

SnelDB is not a general-purpose relational database, a message broker, or a data lake. It’s a specialized tool focused on event-driven data:

  • Unlike a relational database, SnelDB doesn’t model changing rows. It treats data as a log of things that happened.
  • Unlike a message queue, it’s built for storage and querying, not just delivery.
  • Unlike a data warehouse, it’s lightweight and easy to embed in everyday applications.

Think of it as a database that embraces time and immutability as first-class concepts.

A mental model

The easiest way to think about SnelDB is:

  • A notebook for your system’s history: every line is a fact, recorded once, never erased.
  • A timeline you can slice and filter: events are grouped by type, context, and time, so you can quickly zoom in.
  • A replay button: if you need to reconstruct a past sequence, you can ask SnelDB to play it back in order.

A simple example

Imagine you’re building a payments system.

You might store events like:

{ "event_type": "payment_initiated", "context_id": "user_123",  "payload" : { "amount": 100 }, "timestamp": "2025-08-20T09:30:00Z" }
{ "event_type": "payment_verified",  "context_id": "user_123",  "payload" : { "amount": 100 }, "timestamp": "2025-08-20T09:31:00Z" }
{ "event_type": "payment_settled",   "context_id": "user_123", "payload" : { "amount": 100 }, "timestamp": "2025-08-20T09:35:00Z" }

Later, you might want to:

  • Fetch all payment_initiated events from last week
  • Replay all events for user_123 in order
  • Filter for verified payments over $500

And maybe even more:

  • Compare the average settlement time for all payments last month
  • Find all users who initiated a payment but never settled
  • Retrieve the full sequence of events for a disputed transaction
  • Generate a distribution of payment amounts across different countries
  • Train a model using all past transactions, keeping the exact order of events intact

In a traditional setup, you’d stitch together logs, SQL queries, and custom scripts. With SnelDB, these queries are first-class citizens. For example:

QUERY payment_initiated SINCE 2025-08-01

or:

REPLAY FOR user_123

even like:

QUERY payment_verified WHERE amount > 500

Instead of thinking in terms of tables and joins, you think in terms of events. SnelDB is designed so the way you ask matches the way you think: “What happened? When? For whom?”

Key Features

SnelDB is small in surface area but powerful in practice. Here are the highlights that make it different:

1. Append-only storage

Events are immutable. Once stored, they’re never updated or deleted — which means:

  • Perfect audit trails
  • Predictable replay of past behavior
  • No risk of hidden mutations breaking analysis

2. Simple, human-readable commands

No SQL boilerplate. No obscure APIs. SnelDB has a compact command language that reads like plain English:

DEFINE payment FIELDS { "amount": "int", "status": "string" }
STORE payment FOR user-123 PAYLOAD {"amount": 250, "status":"verified"}
QUERY payment WHERE status="verified"
REPLAY FOR user-123

Fast to learn. Easy to remember. Case-insensitive and JSON-native.

3. Fast queries at scale

Under the hood, SnelDB uses an LSM-tree design with:

  • Shards for parallelism
  • Zones and filters to skip irrelevant data
  • Compaction to keep reads efficient over time

The result: queries stay snappy whether you have thousands or billions of events.

4. Replay built in

You don’t just query — you can replay events in order:

REPLAY order_created FOR customer-42

This makes debugging, time-travel analysis, and sequence modeling natural parts of the workflow.

5. Flexible schemas

SnelDB supports schema definitions per event type, with:

  • Strict validation: payloads must match fields
  • Optional fields: declared as string | null
  • Clear errors when something doesn’t line up

This keeps data trustworthy without slowing you down.

6. Designed for AI & analytics

Because events are ordered, immutable, and replayable, SnelDB is a natural fit for:

  • Training models on real-world sequences
  • Feeding pipelines with reproducible datasets
  • Analyzing behavior over time without complex joins
  • Auditing decision processes with confidence

7. Lightweight & embeddable

SnelDB is written in Rust with minimal dependencies. It runs anywhere — from a laptop dev setup to production servers — without heavyweight orchestration.

You can drop it into your stack as a focused, reliable event database.

Built-in Playground

SnelDB ships with a tiny single-page Playground so you can try commands without any client code. Start the server and open http://127.0.0.1:8085/ to experiment with DEFINE, STORE, QUERY, and REPLAY instantly.

8. Safety by design

SnelDB is built in Rust, which brings memory safety, thread safety, and performance without garbage collection.

This means:

  • No segfaults or memory leaks corrupting your data
  • Concurrency without data races
  • Predictable performance, even under load

When you’re storing critical events, safety is not optional — and Rust helps guarantee it from the ground up.


In short: SnelDB is designed to be small but sharp — a tool that does one thing well: make working with immutable events simple, fast, and reliable.

Commands

SnelDB has a compact, human-friendly command language. Keywords are case-insensitive (store, STORE, StOrE all work). Event type names and context IDs are case-preserving.

Core verbs:

  • DEFINE — declare a schema for an event type
  • STORE — append a new event with a JSON payload
  • QUERY — filter events
  • REPLAY — stream events in original order (per context, optionally per type)
  • FLUSH — force a memtable → segment flush
  • PING — health check

If a command returns no rows, you’ll see: No matching events found.

See pages below for full syntax and examples.

Syntax & Operators

Command forms

DEFINE <event_type> FIELDS { "key": "type", … }

STORE  <event_type> FOR <context_id> PAYLOAD <json_object>

QUERY  <event_type> [FOR <context_id>] [SINCE ] [WHERE ] [LIMIT ]

REPLAY [<event_type>] FOR <context_id> [SINCE ]

FLUSH
  • Keywords: case-insensitive.
  • Literals:
    • Strings: double-quoted ("NL", "a string").
    • Numbers: unquoted (42, 3, 900).
    • Booleans: true, false (unquoted).
  • WHERE operators: =, !=, >, >=, <, <=, AND, OR, NOT.
  • Precedence: NOT > AND > OR. Use parentheses sparingly by structuring conditions; (parentheses not required in current grammar).
  • LIMIT: positive integer; caps returned rows.
  • SINCE: ISO-8601 timestamp string (e.g., 2025-08-01T00:00:00Z). When present, only events at/after this instant are considered.

Mini-grammar (informal)

expr      := cmp | NOT expr | expr AND expr | expr OR expr
cmp       :=
op        := = | != | > | >= | < | <=
value     := string | number | boolean

Examples

DEFINE order_created AS 1 FIELDS {
  id: "uuid",
  amount: "float",
  currency: "string"
}

STORE order_created FOR ctx_123 PAYLOAD {
  "id": "a1-b2",
  "amount": 42.5,
  "currency": "EUR",
  "tags": ["new","vip"],
  "flag": true
}

QUERY order_created FOR "ctx_123" SINCE "2025-08-01T00:00:00Z"
WHERE amount >= 40 AND currency = "EUR"
LIMIT 100

REPLAY order_created FOR ctx_123 SINCE "2025-08-01T00:00:00Z"

DEFINE

Purpose

Register the schema for an event type. STORE payloads must conform to this schema.

Form

DEFINE <event_type:WORD> [ AS <version:NUMBER> ] FIELDS { "key_1": "type_1", ... }

Field pairs

  • Keys can be STRING or WORD. The parser will quote WORD keys when converting to JSON.
  • Values (types) can be:
    • STRING literals, for example: "int", "string", "string | null"
    • ARRAY of strings to define an enum, for example: ["pro", "basic"]
      • Enum variants are case-sensitive ("Pro" != "pro")
  • Schema must be flat (no nested objects).

Examples

DEFINE order_created FIELDS { "order_id": "int", "status": "string" }
DEFINE review FIELDS { rating: "int", verified: "bool" }
DEFINE order_created AS 2 FIELDS { order_id: "int", status: "string", note: "string | null" }
DEFINE subscription FIELDS { plan: ["pro", "basic"] }

Typical validation errors raised during STORE

  • No schema defined
  • Missing field status in payload
  • Field order_id is expected to be one of int, but got String
  • Payload contains fields not defined in schema: invalid_field

Store

Purpose

Append an event for a specific context.

Form

STORE <event_type:WORD> FOR <context_id:WORD or STRING> PAYLOAD {"key":"value", ...}

Constraints

  • <context_id> can be a WORD (example: user-1) or a quoted STRING.
  • PAYLOAD must be a flat JSON object (no nested objects).
  • PAYLOAD must follow schema defined using DEFINE command.

Examples

STORE order_created FOR customer-1 PAYLOAD {"order_id":123,"status":"confirmed"}
STORE review FOR "user:ext:42" PAYLOAD {"rating":5,"verified":true}
STORE login FOR user-7 PAYLOAD {"device":"android"}

Behavior

  • Validates payload against the schema of the event type.
  • Rejects missing or extra fields and type mismatches.
  • Durability-first: once acknowledged, the event will survive crashes.

Errors

  • <event_type> cannot be empty
  • <context_id> cannot be empty
  • Schema validation errors (see DEFINE)
  • Overload/backpressure (rare): Shard is busy, try again later

QUERY

Purpose

Filter events by type, optionally by context, time, predicate, and limit.

Form

QUERY <event_type:WORD>
  [ FOR <context_id:WORD or STRING> ]
  [ SINCE <timestamp:STRING> ]
  [ RETURN [ <field:WORD or STRING>, ... ] ]
  [ WHERE <expr> ]
  [ LIMIT <n:NUMBER> ]

Examples

QUERY order_created WHERE status="confirmed"
QUERY order_created WHERE status=confirmed
QUERY order_created WHERE id > 13 AND id < 15
QUERY order_created WHERE country!="NL"
QUERY order_created WHERE country="NL" OR country="FR"
QUERY login FOR user-1 WHERE device="android"
QUERY payment SINCE "2025-08-01T00:00:00Z" WHERE amount >= 500 LIMIT 100
QUERY product RETURN [name, "price"] WHERE price > 10

Notes

  • SINCE is a STRING timestamp.
  • RETURN [ ... ] limits the payload fields included in results. Omit to return all payload fields. An empty list RETURN [] also returns all payload fields.
  • Field names in RETURN can be bare words or quoted strings.
  • Works across in-memory and on-disk segments.
  • If nothing matches, returns: No matching events found.

Gotchas

  • Field names used in WHERE must exist in the schema for that event type.
  • Strings must be double-quoted when you need explicit string literals.
  • Unknown fields in RETURN are ignored; only schema-defined payload fields (plus core fields context_id, event_type, timestamp) are returned.

REPLAY

Purpose

Stream events back in their original append order for a context, optionally restricted to one event type.

Form

REPLAY [ <event_type:WORD> ]
  FOR <context_id:WORD or STRING>
  [ SINCE <timestamp:STRING> ]
  [ RETURN [ <field:WORD or STRING>, ... ] ]

Variants

  • All event types:
REPLAY FOR <context_id>
  • Only specific event types:
REPLAY <event_type> FOR <context_id>

Examples

REPLAY FOR alice
REPLAY order_shipped FOR customer-99
REPLAY FOR "user:ext:42" SINCE "2025-08-20T09:00:00Z"
REPLAY product FOR user-1 RETURN ["name"]

Behavior

  • Routes to the shard owning the context ID.
  • Preserves original order.
  • If nothing matches: No matching events found.
  • RETURN [ ... ] limits payload fields in the replayed events. Omit or use RETURN [] to include all payload fields. Unknown fields are ignored; core fields (context_id, event_type, timestamp) are always present.

Flush

Purpose

Force a memtable flush into an immutable segment.

Form

FLUSH

Notes

Useful for tests, checkpoints, or when you want on-disk segments immediately. Not required for correctness; ingestion continues during flush.

Design

Overview

What this section is

  • A short tour of how SnelDB works inside: the big components and how data flows between them.
  • Enough context for contributors to find their bearings without reading the whole codebase first.

The big picture

  • Commands enter via frontends (TCP/UNIX/HTTP) and are parsed, validated, and dispatched.
  • Writes go through a WAL for durability, land in an in-memory table, and get flushed into immutable segments on disk.
  • Reads (query/replay) scan the in-memory table and segments, skipping as much as possible using zone metadata and filters.
  • Background compaction keeps segments tidy so read performance stays predictable.
  • Sharding by context_id spreads work and makes per-context replay cheap.

Lifecycle at a glance

  • DEFINE: register or update the schema for an event type (used to validate STORE).
  • STORE: validate payload → append to WAL → apply to MemTable → later flush to a new segment.
  • QUERY: fan out to shards, prune zones and project only needed columns, evaluate predicates, merge results.
  • REPLAY: route to the shard for the context_id, stream events in original append order (optionally narrowed by event type).
  • FLUSH: force a MemTable flush to produce a new immutable segment (useful in tests/checkpoints).

What runs where

  • Commands and flow control: command/parser, command/dispatcher, command/handlers.
  • Storage engine: engine/core/* for WAL, memory, segments, zones, filters; engine/store, engine/query, engine/replay.
  • Sharding and concurrency: engine/shard/* (manager, worker, messages).
  • Background work: engine/compactor/* for segment merging and cleanup.
  • Wiring and I/O: frontend/_ listeners; shared/_ for config, responses, logging.

Key guarantees (high level)

  • Durability once a STORE is acknowledged (WAL first).
  • Immutability of events and on-disk segments (compaction replaces whole files, never edits in place).
  • Ordered replay per context_id.
  • Schema-validated payloads (strict by default, optional fields via union types).
  • Bounded memory via shard-local backpressure.

What this section doesn’t do

  • It won’t dive into file formats or algorithmic details; those live in the focused pages that follow.
  • It won’t prescribe ops/production practices; see the development/operations parts of the book.

How to use this section

Skim this page, then jump to the piece you’re touching:

That’s the map. Next pages keep the same tone and size: just enough to guide you to the right code.

Core concepts

  • Event: time-stamped, immutable fact with a typed payload
  • Event type & schema: defined via DEFINE, validates payload shape
  • Context: groups related events under a context_id
  • Shard: independent pipeline — WAL → MemTable → Flush → Segments
  • WAL: per-shard durability log; replayed on startup
  • MemTable: in-memory buffer; flushed when full
  • Segment: immutable on-disk unit with columns, zones, filters, indexes
  • Zone: fixed-size block inside a segment with pruning metadata
  • Compaction: merges small segments to keep reads predictable

Storage Engine

Overview

The storage engine turns incoming events into durable, immutable data you can query quickly. It’s built around append-only writes, in-memory buffering, and on-disk segments that are efficient to scan and easy to skip.

Core Components

  • WAL (write-ahead log): Per-shard durability log. Every accepted event is appended here first.
  • MemTable: In-memory buffer for recent events. Fast inserts; swapped out when full.
  • Flush worker: Converts a full MemTable into an immutable on-disk segment in the background.
  • Segments: On-disk building blocks (columns, zone metadata, filters, lightweight indexes).
  • Snapshots: Optional utility files (.snp events, .smt metadata) for export/replay and range bookkeeping.
  • Compactor (covered later): Merges small segments into larger ones to keep reads predictable.

Write Path (At a Glance)

  1. Validate payload against the event type schema.
  2. Append to the WAL (durability point).
  3. Apply to the MemTable (fast in-memory structure).
  4. When the MemTable hits a threshold, swap it out and enqueue a background flush.
  5. Flush worker writes a new segment and publishes it atomically.

See the diagram below:

Write path

Write Path (In Depth)

0) Validate the event

  • What: Check the incoming payload against the registered schema for its event_type.
  • Why: Ensures only well-formed data enters the system so downstream files and indexes remain consistent.

Example:

{
  "timestamp": 1700000000,
  "event_type": "signup",
  "context_id": "user-42",
  "payload": { "plan": "pro", "country": "US" }
}

Equivalent command:

STORE signup FOR user-42 PAYLOAD {"plan":"pro","country":"US"}

Validation ensures required fields exist and types are correct (for example, the event_type is known and a "plan" is provided in the payload).

1) Append to the WAL (durability point)

  • What: Append the validated event to the per-shard Write-Ahead Log (WAL).
  • Why: Once the append returns, the event will survive a crash. On restart, the system replays WAL entries to rebuild in-memory state and complete any interrupted flushes.
  • Notes:
    • WAL records are lightweight, line-oriented appends (JSON-serialized per line).
    • WAL files rotate in sync with the MemTable flush threshold (engine.flush_threshold), so replay windows are bounded by flush points. After a successful flush, older WAL files up to that cutoff can be pruned.
    • Behavior is tunable via config: [wal] enabled, dir, buffered, buffer_size, flush_each_write, fsync, fsync_every_n and [engine] flush_threshold.

Crash safety example:

  • If the process crashes after the WAL append but before the event hits memory, recovery will re-insert it into the MemTable on startup.

2) Insert into the MemTable (fast in-memory apply)

  • What: Place the event into the in-memory, append-friendly, queryable buffer (MemTable).
  • Why: Absorb writes in memory to batch them into large, sequential segment writes (avoids random I/O), maintain backpressure with bounded memory, and maximize ingest throughput. As a secondary benefit, new events are immediately visible to queries.
  • Behavior:
    • The MemTable is sized by flush_threshold (config). When it reaches capacity, it triggers a swap and a background flush.
    • Inserts are grouped by context so the flusher can scan them quickly.

Small example:

  • flush_threshold = 4
  • Incoming events (in order): A, B, C, D, E
    • A, B, C, D go into the active MemTable. After D, the MemTable is full.
    • A background flush is enqueued for these four; a fresh MemTable becomes active.
    • E enters the new MemTable immediately (no blocking on the background flush).

3) Swap and enqueue a background flush

  • What: When the active MemTable is full, it’s atomically swapped for a fresh, empty one, and the full snapshot is queued for flushing.
  • Why: Writers remain responsive (no long I/O in the foreground) and the system maintains bounded memory.
  • Details:
    • The passive MemTable (now immutable) is handed off to the flush worker.
    • Writes proceed into the newly created active MemTable.

4) Flush worker writes a new immutable segment

  • What: The background worker turns the passive MemTable into an on-disk segment directory (for example, segment-00042/).
  • Inside the segment:
    • Column files: One file per field, optimized for sequential appends and later memory-mapped (mmap) access. Naming: <uid>_<field>.col. Example: u01_timestamp.col, u01_event_type.col, u01_context_id.col, u01_plan.col, u01_country.col. Where <uid> is defiened per event type.
    • Zone metadata: Per-zone min/max timestamps, row ranges, and presence stats for pruning.
    • Filters: Compact structures for pre-read pruning:
      • XOR: <uid>_<field>.xf (approximate membership)
      • Enum Bitmap (EBM): <uid>_<field>.ebm (eq/neq for enums)
      • Zone SuRF: <uid>_<field>.zsrf (succinct range filter for >, >=, <, <=)
    • Offsets/Index: Per-zone compressed offsets (.zfc files) describing compressed block ranges and in-block offsets.
    • Snapshots (optional):
    • Event Snapshots (.snp): portable arrays of events with a binary header + length‑prefixed JSON entries.
    • Snapshot Metadata (.smt): arrays of {uid, context_id, from_ts, to_ts} entries with a binary header + length‑prefixed JSON.
  • Publication: Segment creation is atomic at the directory level; once complete, readers can discover and scan it.

See the diagram below:

UID storage example

Sizing example:

  • flush_threshold = 32_768
  • events_per_zone = 2_048
  • A full flush of 32,768 events creates exactly 16 zones. Each zone has its own metadata and contributes field values to the filter files. Larger events_per_zone values reduce metadata overhead but offer coarser pruning; smaller values increase pruning precision at the cost of more metadata.

5) Cleanup and WAL compaction

  • What: After a successful flush, the system can prune or rotate old WAL files up to a cutoff corresponding to flushed data.
  • Why: Keeps recovery time short and disk usage bounded.

End-to-end write example

  1. Client sends STORE signup ... with a valid payload.
  2. The engine validates the event against the signup schema.
  3. The event is appended to the WAL for shard 3 (durability).
  4. The event is inserted into shard 3’s active MemTable.
  5. When the MemTable reaches flush_threshold, it is swapped and the old one is queued for the background flush.
  6. The flush worker writes segment-00137/ with column files, 16 zones (if 32,768/2,048), zone metadata, XOR filters, Zone SuRF filters, and offsets/index.
  7. Once published, queries immediately see the segment alongside any newer in-memory events.
  8. The WAL up to (and including) the flushed range is now safe to compact or rotate.

Failure model (write path)

  • Crash before WAL append: The event is lost (not acknowledged).
  • Crash after WAL append but before MemTable insert: The event is recovered from the WAL and re-applied on startup.
  • Crash after MemTable insert but before flush: The event is not yet in a segment, but it is durable in the WAL. On restart, WAL replay restores it to the MemTable; if a swap occurred and a passive MemTable existed, its contents are reconstructed from WAL as well. No data loss; no duplicate segments.
  • Crash during flush: The WAL still contains the flushed events; on restart, the system replays or completes the flush. Partially written segments are ignored until a valid, fully published segment is present.

Tuning the write path

  • shards: More shards increase parallelism of WAL, MemTable, and flush pipelines (at the cost of more intense CPU and RAM usage and more files and directories).
  • flush_threshold: Controls MemTable size. Higher values reduce flush frequency (bigger segments) but increase peak memory and WAL replay cost.
  • events_per_zone: Smaller values improve pruning for reads but increase metadata and filter counts. Pick based on query selectivity and typical field cardinalities.

Durability & Recovery

  • Covered in the write path: WAL append is the durability point; replay restores MemTables; WAL rotation keeps recovery bounded. See Failure model above.

Backpressure & Safety

  • Bounded channels between components provide backpressure under load (writers slow down instead of exhausting memory).
  • Async workers (flush and compaction) are throttled so foreground writes and reads stay responsive.

This is the spine of the engine: durable append, fast memory, immutable segments with rich metadata, and just enough background work to keep reads snappy as data grows.

Read-time Projection & Column Pruning

  • The query planner derives a minimal column set to load based on:
    • Core fields: context_id, event_type, timestamp (always loaded)
    • Filter fields used in WHERE
    • Requested payload fields from RETURN [ ... ] (if provided)
  • If RETURN is omitted or empty (RETURN []), all payload fields are considered eligible.
  • Unknown fields in RETURN are ignored (schema-driven).
  • Only the selected columns are mmap’d and read; others are skipped entirely, reducing I/O and memory.
  • Projection decisions are logged under the query::projection target for debugging.

Query and Replay

Overview

SnelDB reads come in two flavors:

  • QUERY: filter one event type by predicates, time, and optional context_id; may span shards.
  • REPLAY: stream all events for one context_id (optionally one type) in original append order; single shard.

Both use the same internals as the write path: in‑memory MemTable, on‑disk immutable segments, per‑segment zones, and compact per‑field filters.

When to Use Which

  • Use QUERY for analytics, debugging slices, and ad‑hoc filters across many contexts.
  • Use REPLAY to rebuild state or audit the exact sequence for one context.

Examples

  • QUERY

    • Investigate: "All order_created over $100 in the last 24h across all users"
    • Dashboard: "Errors by type this week"
    • Debug: "Sessions with status = 'pending' and retries > 3"
  • REPLAY

    • Operational debugging (incident timeline)
      REPLAY system_event FOR host-123 SINCE "2024-05-01T00:00:00Z"
      
    • Auditing/compliance (full account trail)
      REPLAY FOR account-42 SINCE "2024-01-01T00:00:00Z"
      
    • ML pipelines (rebuild a customer’s transaction sequence)
      REPLAY transaction FOR user-456 SINCE "2023-01-01T00:00:00Z"
      
    • Product journey (single user or session in order)
      REPLAY FOR user-123
      

Command Cheatsheet

QUERY <event_type> [FOR <context_id>] [SINCE <ts>] [WHERE <expr>] [LIMIT <n>]
REPLAY [<event_type>] FOR <context_id> [SINCE <ts>]

More examples: Query and Replay

How It Works

QUERY (step‑by‑step)

  1. Parse and validate inputs.
  2. Plan shard tasks (fan‑out unless narrowed by context_id).
  3. Per shard, scan MemTable and pick relevant segments.
  4. Prune zones by time and per‑field filters; read only needed columns.
    • Range predicates (>, >=, <, <=) are pruned using Zone SuRF ({uid}_{field}.zsrf) when present, falling back to XOR/EBM only if unavailable. SuRF is an order‑preserving trie using succinct arrays for fast range overlap checks.
  5. Evaluate predicates and apply WHERE condition.
  6. Merge shard results; apply global LIMIT if set.

REPLAY (step‑by‑step)

  1. Parse and validate inputs.
  2. Route to the shard owning the context_id.
  3. Scan MemTable and relevant segments for that context.
  4. Apply optional event_type and SINCE filters.
  5. Stream events in original append order.

See the diagram:

Query and Replay flow

What You Get

  • Visibility: fresh writes are visible from MemTable before flush.
  • Ordering: REPLAY preserves append order (single shard). QUERY has no global ordering unless you explicitly sort at merge (costly) or scope the query narrowly.
  • LIMIT (QUERY): short‑circuit per shard when possible; always cap globally during merge.

Performance Tips

  • Prune early: favor event_type, context_id, and SINCE to skip zones fast.
  • Shard wisely: more shards increase scan parallelism but cost more on fan‑out.

Tuning

  • events_per_zone: smaller zones = better pruning, more metadata; larger zones = fewer skips, less metadata.
  • flush_threshold: affects how much is in memory vs on disk, and segment cadence.
  • Shard count: match to CPU and expected concurrency.

Invariants

  • Immutability: events and segments are never edited in place.
  • Single‑shard replay: each context_id maps to exactly one shard.
  • Schema validity: stored payloads conform to their event type schema.
  • Atomic publication: new segments become visible all‑or‑nothing.

Further Reading

SnelDB’s read path is simple to reason about: prune aggressively, read only what you need, and merge efficiently—whether you’re slicing across many contexts or replaying one.

Sharding

What it is

Sharding is how SnelDB scales ingestion and keeps per-context replay efficient. Instead of one big pipeline, the system runs multiple shard workers side by side. Each context_id is deterministically mapped to a shard, so all events for that context live together.

Core pieces

  • Shard Manager — owns all shards and routes work to them by hashing context_id.
  • Shard (worker) — long‑lived task that owns a WAL, active/passive MemTables, a flush queue, and the shard’s segment list. Processes Store, Query, Replay, and Flush messages.
  • Messages — typed messages delivered to each shard: Store, Query, Replay, Flush.
  • Backpressure — each shard has a bounded mailbox; when it fills, senders wait. Hot shards slow down independently without affecting others.

Sharding overview

How it works

  • Startup

    • The manager creates N shards (configurable) and starts one worker per shard.
    • Each shard ensures its storage directories exist, recovers its MemTable from its WAL, loads existing segment IDs, and starts background services (flush, compaction).
  • Store

    • Hash context_id → pick shard → send Store.
    • The shard appends to its WAL, updates the in‑memory MemTable, and, when the MemTable reaches its threshold, rotates it to a passive buffer and enqueues a flush.
  • Query

    • Broadcast to all shards. Each shard scans its in‑memory state and on‑disk segments and returns matches. Results are merged.
  • Replay

    • Single‑shard. The manager routes to the shard that owns the context_id. The shard streams events in order for that context.
  • Flush

    • Manual Flush is broadcast to all shards. Each shard rotates its active MemTable and enqueues a flush to create a new segment.
    • Automatic flush also occurs when a shard’s MemTable reaches its configured threshold during ingestion.

Why this design

  • Locality: all events for a context_id stay on one shard → fast, single‑shard replay.
  • Parallelism: shards work independently → ingestion and queries scale with cores.
  • Isolation: hot shards apply backpressure locally without stalling the whole system.
  • Simplicity: shards don’t coordinate directly; only query results are merged.

Invariants

  • Same context_id → always the same shard.
  • Within a shard, event order per context_id is preserved.
  • Shards never share mutable state; cross‑shard communication happens via message passing and result merging.

Operational notes

  • Number of shards controls parallelism; increase to utilize more CPU cores.
  • Flush threshold tunes memory usage vs. write amplification; lower values flush more often.
  • On startup, shards recover from their WALs before serving traffic; compaction runs in the background to control segment growth.

Further Reading

Sharding is the concurrency backbone: it divides the work, keeps replay cheap, and prevents overload by applying backpressure shard by shard.

Infrastructure

SnelDB isn’t just a storage engine — it needs the scaffolding around it to feel safe, predictable, and easy to integrate. That’s what the infrastructure layer provides.

Configuration

Every system needs a single source of truth for its settings. SnelDB loads a configuration once at startup and makes it available everywhere. This means:

  • Consistency — all components (server, engine, WAL, logging) read from the same snapshot.
  • Flexibility — settings can be changed through a config file or environment variable without recompiling.
  • Safety — startup fails fast if something critical is missing or invalid.

Think of it as the contract between how you run SnelDB and how the engine behaves.

Logging

Logs are the “black box recorder” of SnelDB. They serve two purposes:

  • For operators: real-time feedback in the console (levels like info/debug/warn).
  • For long-term visibility: structured logs rotated daily on disk.

The philosophy is simple: logs should be human-readable, lightweight, and always available when you need to explain “what just happened.”

Responses

Every command produces a response. SnelDB keeps them minimal and predictable:

  • A clear status code (OK, BadRequest, NotFound, InternalError).
  • A message for humans.
  • A body that can be either lines (for CLI-like tools) or structured JSON arrays (for programmatic use).

Two renderers handle the output: one friendly for terminals, one clean for machines. This way, SnelDB speaks both languages without complicating the core.

Why it matters

These pieces aren’t “extra code” — they’re the glue that makes SnelDB usable in the real world:

  • Configuration means you can run the same binary in development, staging, and production with confidence.
  • Logging means you can trust the system to tell you what it’s doing, even when things go wrong.
  • Responses mean every client, from shell scripts to dashboards, gets consistent feedback.

Together, they provide the operational safety net: when you store events, you know how to configure it, you see what’s happening, and you get a clear answer back.

Compaction

What it is

Compaction keeps reads predictable as data grows. Instead of editing files in place, SnelDB periodically merges small, freshly-flushed segments into larger, cleaner ones. This reduces file count, tightens zone metadata, and improves pruning—without touching the logical history of events.

Why it matters

  • Fewer segments → fewer seeks and better cache behavior.
  • Larger, well-formed zones → more “skip work” during queries.
  • Stable tail latencies as ingestion continues day after day.

How it runs (big picture)

  • One background task per shard.
  • Wakes up at a configurable interval.
  • Checks disk/IO pressure; if the system is busy, skips this round.
  • Looks at the shard’s segment index to decide if compaction is needed.
  • When needed, launches a compaction worker to perform the merge and publish new segments atomically.

Shard-local by design

Each shard compacts its own segments. This keeps the work isolated, prevents cross-shard coordination, and preserves the “all events for a context live together” property.

When it triggers

  • On a fixed timer (compaction_interval).
  • Only if the segment index reports that thresholds are met (e.g., too many L0s, size/age criteria).
  • Skips entirely if IO pressure is high to avoid hurting foreground work.

Safety & correctness

  • Segments are immutable; compaction writes new files and then swaps pointers in one step.
  • If a run fails, nothing is partially applied; the old segments remain authoritative.
  • Reads continue throughout—queries see either the old set or the new set, never a half state.
  • Replay order and event immutability are unaffected.

Resource awareness

  • The loop samples system state (disks/IO) before running.
  • Under pressure, the compactor yields to ingestion and queries.
  • This protects P99 read latencies and avoids “compaction storms.”

What the worker does (conceptually)

  • Selects a compaction set (often recent small segments).
  • Merges column files in order, rebuilding zones, filters, and indexes.
  • Emits a new, leveled segment and updates the segment index.
  • Schedules old segments for deletion/GC.

Operator knobs

  • compaction_interval: how often to check.
  • compaction_threshold: when the segment index should say “yes, compact.”
  • sys_io_threshold (and related IO heuristics): how conservative to be under load.
  • events_per_zone and flush_threshold: influence zone granularity and L0 creation rate (tune together).

Invariants

  • No in-place mutation; only append/replace at the segment set level.
  • Queries stay available and correct while compaction runs.
  • Failures are contained to the background task; foreground paths remain healthy.

What this page is not

  • A file-format spec or merge algorithm walkthrough.
  • A policy recipe for every workload. The defaults aim for good general behavior; heavy write or read-mostly deployments may tune the thresholds differently.

Layering Strategy in SnelDB

This page gives a high-level view of how SnelDB is structured. It focuses on what each layer does and how requests flow through the system.

Layer 1: frontend — Transport and Connections

  • Listens for client connections (e.g., Unix/TCP/HTTP).
  • Reads requests and writes responses.
  • Hands off parsing and execution to the command and engine layers.

Layer 2: command — Parse and Dispatch

  • Parses user input (e.g., DEFINE, STORE, QUERY).
  • Validates and turns text into typed commands.
  • Dispatches to the appropriate operation in the engine.

Layer 3: engine — Core Logic

  • Implements the main behaviors: define schemas, store events, run queries, replay, and flush.
  • Chooses the right shard and updates on-disk data as needed.
  • Stays independent from how clients connect or send requests.

Layer 4: shared — Common Utilities

  • Configuration and response types used across layers.
  • Logging setup and other small shared helpers.

Flow Summary (STORE example)

  1. Frontend receives a request.
  2. command parses and validates it.
  3. The dispatcher routes to the correct engine operation.
  4. engine executes and updates storage.
  5. A response is returned to the client.

Why this layering?

  • Clean separation: parsing, logic, and transport are independent.
  • Easy to test: engine logic can be tested without real sockets.
  • Scales well: clear boundaries support growth and optimization.

Threading and Async

What it is

  • Networking is handled with async tasks (Tokio) for each client connection.
  • Work is executed by per-shard worker tasks, communicated via message passing.
  • This separates I/O from data processing and keeps shard state isolated.

Core pieces

  • Frontends — Unix/TCP/HTTP listeners accept connections and spawn a task per client.
  • Connection — reads lines, parses commands, and dispatches them for execution.
  • Shard Manager — owns shards and routes work by hashing context_id.
  • Shard (worker) — long‑lived task that owns WAL, MemTables, flush queue, and segment list; handles Store, Query, Replay, Flush.
  • Channels — tokio::sync::mpsc for sending typed messages to shards.
  • Schema Registry — shared via Arc<tokio::sync::RwLock<SchemaRegistry>>.

How it works

  • Startup

    • Initialize the schema registry and shard manager.
    • Bind a Unix listener and start accepting connections.
    • Spawn background workers (flush, compaction) per shard.
  • Connection handling

    • Spawn a task per client.
    • Read lines, parse into commands, dispatch to the shard manager.
  • Store

    • Route to shard by context_id.
    • Append to WAL, update active MemTable; rotate and enqueue flush when needed.
  • Query

    • Broadcast to all shards.
    • Each shard scans its in‑memory and on‑disk state and returns matches; results are merged.
  • Replay

    • Route to the shard for the context_id.
    • Stream events in original append order for that context.
  • Flush

    • Broadcast; shards rotate MemTables and enqueue flush to produce a new segment.

Why this design

  • Async I/O: efficient, scalable handling of many connections.
  • Shard workers: clear ownership and predictable performance.
  • Separation of concerns: networking and storage logic don’t intermingle.

Invariants

  • Frontends do not perform disk I/O or modify indexes directly.
  • Shard workers own shard state; cross‑shard mutable sharing is avoided.
  • Schema access uses async RwLock for safe concurrent reads/writes.

Operational notes

  • Bounded shard mailboxes apply local backpressure; tune channel sizes as needed.
  • Number of shards controls parallelism; size to match CPU/core availability.
  • Monitor channel depth and lock contention to spot hotspots.

Further Reading

Logging

What it is

  • SnelDB uses the tracing ecosystem for structured, leveled logs.
  • Logs are emitted to stdout and to a daily‑rotated file, with independent levels.
  • Levels and output directory are configured via the config file.

Core pieces

  • Initializer — sets up tracing_subscriber layers for stdout and file.
  • Config — [logging] section controls log_dir, stdout_level, and file_level.
  • Levels — error, warn, info, debug, trace.

How it works

  • Startup

    • logging::init() is called from main.rs before starting frontends.
    • Reads CONFIG.logging to build filters and writers.
    • Installs two layers: ANSI stdout and file appender (sneldb.log, daily rotation).
  • Emitting logs

    • Use tracing::{error!, warn!, info!, debug!, trace!} in code.
    • Prefer spans (e.g., #[instrument]) to capture context around operations.

Configuration

Example snippet from config.toml:

[logging]
log_dir = "../data/logs"
stdout_level = "debug"
file_level = "error"
  • stdout_level: global level for console logs.
  • file_level: global level for file logs.
  • log_dir: directory where sneldb.log is created (daily rotation).

Why this design

  • Structured logs with levels and spans ease debugging and operations.
  • Separate stdout/file control supports local development and production hygiene.

Operational notes

  • Tune levels per environment (e.g., stdout_level=warn in prod).
  • Ensure log_dir exists and is writable; it is created on first write by the appender.
  • Use targets when necessary to scope logs for noisy modules.

Further Reading

  • tracing crate docs
  • tracing_subscriber filters and formatters

File Formats and Data Layout

What it is

  • The on-disk layout for shards and segments, and the binary formats used for columns, offsets, zone metadata, indexes, and schemas.
  • These formats are append-friendly, read-optimized, and simple to parse with memory maps.

Core pieces

  • Segments — segment-xxxxx/ directories under each shard.
  • Columns — {uid}_{field}.col files storing values with length prefixes.
  • Zone Compressed Offsets — {uid}_{field}.zfc files listing per-zone compressed block metadata and in-block offsets.
  • Zone Metadata — {uid}.zones containing per-zone min/max timestamps and row ranges.
  • Zone Index — {uid}.idx mapping context_id values to zone ids.
  • XOR Filters — {uid}_{field}.xf per-field filters for fast membership tests.
  • Enum Bitmap Indexes — {uid}_{field}.ebm per-enum-field bitmaps for zone pruning.
  • Zone SuRF Filters — {uid}_{field}.zsrf per-field per-zone succinct range filters for range pruning.
  • Schemas — schema/schemas.bin append-only records of event type schemas and UIDs.

Binary headers

  • All binary files now begin with a fixed, 20-byte header to improve safety and detect corruption.
  • Header layout (little-endian):
    • 8 bytes: MAGIC (ASCII tag identifying file kind)
    • 2 bytes: VERSION (u16)
    • 2 bytes: FLAGS (u16)
    • 4 bytes: RESERVED (u32)
    • 4 bytes: HEADER_CRC32 (u32) computed over MAGIC+VERSION+FLAGS+RESERVED
  • WAL logs remain newline-delimited JSON without a binary header.

Magic strings per file kind:

  • Columns (.col): EVDBCOL\0
  • Zone Compressed Offsets (.zfc): EVDBZCF\0
  • Zone Metadata (.zones): EVDBZON\0
  • Zone Index (.idx per-UID/context): EVDBUID\0
  • XOR Filters (.xf): EVDBXRF\0
  • Zone SuRF Filters (.zsrf): EVDBZSF\0
  • Shard Segment Index (segments.idx): EVDBSIX\0
  • Schemas (schemas.bin): EVDBSCH\0
  • Enum Bitmap Index (.ebm): EVDBEBM\0
  • Event Snapshots (.snp): EVDBSNP\0
  • Snapshot Metadata (.smt): EVDBSMT\0

Compatibility and migration:

  • Readers tolerate legacy files that lack headers and continue to parse them.
  • New writers always prepend the header.
  • A future strict mode may enforce headers on read.

Directory layout

data/
├── cols/
│   ├── shard-0/
│   │   └── segment-00000/
│   │       ├── {uid}_{field}.col
│   │       ├── {uid}_{field}.zfc
│   │       ├── {uid}.zones
│   │       ├── {uid}.idx
│   │       ├── {uid}_{field}.xf
│   │       ├── {uid}_{field}.zsrf
│   │       └── {uid}_{field}.ebm
│   └── shard-1/
│       └── segment-00000/
├── logs/
│   └── sneldb.log.YYYY-MM-DD
└── schema/
    └── schemas.bin

Snapshots are ad-hoc utility files and can be written anywhere (not tied to the segment layout). Typical usage writes them to a caller-provided path.

Column files: {uid}_{field}.col

  • Format per value (binary):
    • File begins with a binary header (MAGIC EVDBCOL\0).
    • [u16] little-endian length
    • [bytes] UTF‑8 string of the value
  • Access pattern: memory-mapped and sliced using offsets.

Zone compressed offsets: {uid}_{field}.zfc

  • Binary layout per zone (repeated):
    • File begins with a binary header (MAGIC EVDBZOF\0).
    • [u32] zone_id
    • [u32] count number of offsets
    • [u64] * count byte offsets into the corresponding .col
  • Purpose: enables loading only the rows for a given zone by first reading and decompressing the zone block, then slicing values using in-block offsets.

Zone metadata: {uid}.zones

  • Bincode-encoded Vec<ZoneMeta>.
  • File begins with a binary header (MAGIC EVDBZON\0).
  • Fields:
    • zone_id: u32
    • uid: String
    • segment_id: u64
    • start_row: u32
    • end_row: u32
    • timestamp_min: u64
    • timestamp_max: u64

Zone index: {uid}.idx

  • Binary map of event_type -> context_id -> [zone_id...].
  • Used to quickly locate candidate zones by context_id.
  • Written via ZoneIndex::write_to_path and read with ZoneIndex::load_from_path.
  • File begins with a binary header (MAGIC EVDBUID\0).

XOR filters: {uid}_{field}.xf

  • Bincode-serialized BinaryFuse8 filter over unique field values.
  • Used for fast approximate membership checks during planning.
  • File begins with a binary header (MAGIC EVDBXRF\0).

Zone SuRF filters: {uid}_{field}.zsrf

  • Bincode-serialized ZoneSurfFilter containing Vec<ZoneSurfEntry>.
  • Purpose: zone-level range pruning for numeric, string, and boolean fields using a succinct trie.
  • File begins with a binary header (MAGIC EVDBZSF\0).
  • Contents:
    • entries: Vec<ZoneSurfEntry> where each entry is { zone_id: u32, trie: SurfTrie }.
    • SurfTrie stores compact arrays of degrees, child offsets, labels, and terminal flags.
  • Built during flush/compaction by ZoneWriter::write_all.
  • Used by ZoneFinder for Gt/Gte/Lt/Lte operations before falling back to XOR/EBM.
  • Naming mirrors .xf/.ebm: per uid and field.

Enum bitmap index: {uid}_{field}.ebm

  • Zone-level bitmaps per enum variant for fast Eq/Neq pruning.
  • File begins with a binary header (MAGIC EVDBEBM\0).
  • Binary layout:
    • [u16] variant_count
    • Repeated variant_count times:
      • [u16] name_len
      • [bytes] variant_name (UTF‑8)
    • [u16] rows_per_zone
    • Repeated per zone present in the file:
      • [u32] zone_id
      • [u16] variant_count_again
      • Repeated variant_count_again times:
        • [u32] bitmap_len_bytes
        • [bytes] packed_bitmap (LSB-first within a byte; bit i set ⇒ row i has this variant)
  • Usage: on a filter plan = "pro", prune zones where the pro bitmap is all zeros; similarly for != by checking any non-target variant has a bit set.
  • Observability: use convertor ebm <segment_dir> <uid> <field> to dump a JSON view of per-zone row positions per variant.

Schemas: schema/schemas.bin

  • Append-only file of bincode-encoded SchemaRecord entries:
    • uid: String
    • event_type: String
    • schema: MiniSchema
  • Loaded at startup by SchemaRegistry.
  • File begins with a binary header (MAGIC EVDBSCH\0).

Shard segment index: segments.idx

  • Bincode-encoded Vec<SegmentEntry>; file begins with a binary header (MAGIC EVDBSIX\0).

Why this design

  • Immutable segments + append-only metadata simplify recovery and concurrency.
  • Memory-mappable, length-prefixed encodings keep parsing simple and fast.
  • Separate files per concern (values, offsets, metadata, indexes) enable targeted IO.

Operational notes

  • Segment directories are named segment-00000, segment-00001, ...
  • UIDs are per-event-type identifiers generated at DEFINE; filenames use {uid} not the event type.
  • New fields simply create new .col/.zfc/.xf files in subsequent segments.

Further Reading

Event snapshots: *.snp

  • Purpose: portable bundles of events (potentially mixed types) for export, testing, or replay.
  • File begins with a binary header (MAGIC EVDBSNP\0).
  • Binary layout after header:
    • [u32] num_events
    • Repeated num_events times:
      • [u32] len_bytes
      • [bytes] JSON-serialized Event (same schema as API/Event struct)
  • Notes:
    • Events are serialized as JSON for compatibility (payloads can contain arbitrary JSON values).
    • Readers stop gracefully on truncated data (warn and return successfully with the parsed prefix).

Snapshot metadata: *.smt

  • Purpose: describes snapshot ranges per (uid, context_id) with min/max timestamps.
  • File begins with a binary header (MAGIC EVDBSMT\0).
  • Binary layout after header:
    • [u32] num_records
    • Repeated num_records times:
      • [u32] len_bytes
      • [bytes] JSON-serialized SnapshotMeta { uid, context_id, from_ts, to_ts }
  • Notes:
    • JSON is used for the same reasons as snapshots (arbitrary strings/IDs, forward-compat fields).
    • Readers stop gracefully on truncated data (warn and return successfully with the parsed prefix).