Query and Replay
Overview
SnelDB reads come in two flavors:
QUERY: filter one event type by predicates, time, and optionalcontext_id; may span shards.REPLAY: stream all events for onecontext_id(optionally one type) in original append order; single shard.
Both use the same internals as the write path: in‑memory MemTable, on‑disk immutable segments, per‑segment zones, and compact per‑field filters.
When to Use Which
- Use
QUERYfor analytics, debugging slices, and ad‑hoc filters across many contexts. - Use
REPLAYto rebuild state or audit the exact sequence for one context.
Examples
-
QUERY
- Investigate: “All
order_createdover $100 in the last 24h across all users” - Dashboard: “Errors by type this week”
- Debug: “Sessions with
status = 'pending'andretries > 3”
- Investigate: “All
-
REPLAY
- Operational debugging (incident timeline)
REPLAY system_event FOR host-123 SINCE "2024-05-01T00:00:00Z" - Auditing/compliance (full account trail)
REPLAY FOR account-42 SINCE "2024-01-01T00:00:00Z" - ML pipelines (rebuild a customer’s transaction sequence)
REPLAY transaction FOR user-456 SINCE "2023-01-01T00:00:00Z" - Product journey (single user or session in order)
REPLAY FOR user-123
- Operational debugging (incident timeline)
Command Cheatsheet
QUERY <event_type> [FOR <context_id>] [SINCE <ts>] [WHERE <expr>] [LIMIT <n>]
REPLAY [<event_type>] FOR <context_id> [SINCE <ts>]
More examples: Query and Replay
How It Works
QUERY (step‑by‑step)
-
Parse and validate inputs.
-
Plan shard tasks (fan‑out unless narrowed by
context_id). -
Per shard, scan MemTable and pick relevant segments.
-
Prune zones by time and per‑field filters; read only needed columns.
- Range predicates (
>,>=,<,<=) are pruned using Zone SuRF ({uid}_{field}.zsrf) when present, falling back to XOR/EBM only if unavailable. SuRF is an order‑preserving trie using succinct arrays for fast range overlap checks. - Equality predicates (
=,IN) use Zone XOR indexes ({uid}_{field}.zxf) for fast zone lookup. - Complex WHERE clauses with parentheses, AND/OR/NOT are transformed into a FilterGroup tree, and zones are combined using set operations (intersection for AND, union for OR, complement for NOT). See Filter Architecture for details.
- Range predicates (
-
Evaluate predicates and apply
WHEREcondition. -
If aggregations are present:
- Build an aggregation plan (ops, optional group_by, optional time bucket and selected time field).
- In each shard, update aggregators from both MemTable (row path) and segments (columnar path). Segment scans project only needed columns (filters, group_by, time field, agg inputs).
- Group keys combine optional time bucket with
group_byvalues; a fast prehash accelerates hashmap grouping. - Merge partial aggregation states across shards; finalize into a table (bucket? + group columns + metric columns).
LIMITcaps distinct groups.
Otherwise (selection path):
- Merge rows; apply global
LIMITif set.
Sequence Queries (step‑by‑step)
Sequence queries (FOLLOWED BY, PRECEDED BY, LINKED BY) follow a specialized path optimized for finding ordered event pairs:
- Parse sequence: Extract event types, link field, and sequence operator from the query.
- Parallel zone collection: Collect zones for all event types in parallel across shards. Each event type gets its own query plan with transformed WHERE clauses (event-prefixed fields like
page_view.pagebecomepagefor thepage_viewplan). - Index strategy assignment: Assign index strategies to filter plans so zone XOR indexes are used for field filters.
- Zone hydration: Load column values (including the
link_field) without materializing events. - Grouping: Group row indices by
link_fieldvalue using columnar data. Within each group, sort by timestamp. - Matching: Apply the two-pointer algorithm to find matching sequences:
- For
FOLLOWED BY: find events whereevent_type_boccurs at the same timestamp or later - For
PRECEDED BY: find events whereevent_type_boccurred strictly before - Apply WHERE clause filters during matching to avoid materializing non-matching events
- For
- Materialization: Only materialize events from matched sequences, using
EventBuilderandPreparedAccessorfor efficient construction.
Performance optimizations:
- Columnar processing avoids premature event materialization
- Early filtering reduces the search space before grouping
- Parallel zone collection for different event types
- Index usage for
link_fieldandevent_typefilters - Limit short-circuiting stops processing once enough matches are found
REPLAY (step‑by‑step)
- Parse and validate inputs.
- Route to the shard owning the
context_id. - Scan MemTable and relevant segments for that context.
- Apply optional
event_typeandSINCEfilters. - Stream events in original append order.
See the diagram:
What You Get
- Visibility: fresh writes are visible from
MemTablebefore flush. - Ordering:
REPLAYpreserves append order (single shard).QUERYhas no global ordering unless you explicitly sort at merge (costly) or scope the query narrowly. - LIMIT (
QUERY): short‑circuit per shard when possible; always cap globally during merge.
Performance Tips
- Prune early: favor
event_type,context_id, andSINCEto skip zones fast. - Shard wisely: more shards increase scan parallelism but cost more on fan‑out.
Tuning
events_per_zone: smaller zones = better pruning, more metadata; larger zones = fewer skips, less metadata.flush_threshold: affects how much is in memory vs on disk, and segment cadence.- Shard count: match to CPU and expected concurrency.
Invariants
- Immutability: events and segments are never edited in place.
- Single‑shard replay: each
context_idmaps to exactly one shard. - Schema validity: stored payloads conform to their event type schema.
- Atomic publication: new segments become visible all‑or‑nothing.
Further Reading
SnelDB’s read path is simple to reason about: prune aggressively, read only what you need, and merge efficiently—whether you’re slicing across many contexts or replaying one.