HelixFS
Zero-dependency filesystem with CoW, journaling, B+Tree indexing, and ARC caching.
HelixFS Overview
HelixFS (helix-fs, 65 source files, ~12,000 lines) is Helix's native filesystem, built from scratch with zero external dependencies. It uses a Copy-on-Write B-tree architecture inspired by Btrfs and ZFS, providing instant snapshots, data integrity, and compression.
Key Features
| Feature | Implementation |
|---|---|
| Copy-on-Write | All writes create new blocks — originals preserved |
| B-tree indexing | Self-balancing tree for O(log n) lookups |
| Journaling | Write-Ahead Log (WAL) for crash recovery |
| Snapshots | Instant, space-efficient snapshots via CoW |
| Compression | Adaptive — LZ4 (fast) or Zstd (ratio) |
| Encryption | AES-256-GCM AEAD, per-file or per-volume |
| Extent-based allocation | Contiguous block ranges for large files |
| VFS layer | POSIX-like interface for the rest of the kernel |
Constants
Module Architecture
Disk Layout
Superblock
Inode
B-tree Engine
The B-tree is the core data structure of HelixFS. All metadata — directory entries, extent maps, free space — is stored in B-trees.
Node Structure
Each node fits in a single NODE_SIZE (16 KB) block. With an average key size of 32 bytes, an internal node can hold ~500 keys, giving a branching factor of ~500.
Operations
| Operation | Complexity | Description |
|---|---|---|
| Search | O(log n) | Binary search within nodes, tree traversal |
| Insert | O(log n) | Insert + split if node is full |
| Delete | O(log n) | Delete + merge if node is underfull |
| Range scan | O(log n + k) | Find start, then follow sibling links |
Copy-on-Write
When modifying a B-tree node:
- Allocate a new block for the modified node
- Copy the node contents and apply the modification
- Update the parent pointer to the new block
- The old block remains unchanged (referenced by snapshots)
This cascades up to the root — each modification creates a new path from the modified leaf to a new root. The old root is preserved as the snapshot root.
Journaling
HelixFS uses a Write-Ahead Log (WAL) for crash consistency.
Transaction Model
Write Flow
Crash Recovery
On mount after an unclean shutdown:
- Read the WAL from the journal area
- Find the last committed transaction
- Replay all committed records that weren't checkpointed
- Discard any uncommitted (active/aborted) transactions
- Resume normal operation
Snapshots
Snapshots leverage the CoW B-tree to capture the filesystem state at a point in time.
Creating a Snapshot
Creating a snapshot is instantaneous — it simply records the current B-tree root. Since all writes use CoW, the snapshot's data is never modified.
Snapshot Tree
Snapshots form a tree structure (parent-child relationships):
Deleting a snapshot only frees blocks that are not referenced by any other snapshot.
Compression
HelixFS supports transparent per-block compression with two algorithms:
Algorithms
| Algorithm | Speed | Ratio | Use Case |
|---|---|---|---|
| LZ4 | ~500 MB/s compress, ~1.5 GB/s decompress | 2:1 typical | Default — fast for interactive use |
| Zstd | ~100 MB/s compress, ~500 MB/s decompress | 3:1+ typical | Archival data, cold storage |
Adaptive Selection
The AdaptiveCompressor automatically selects the best algorithm per block:
- Compress the block with both LZ4 and Zstd
- If neither achieves > 12.5% reduction, store uncompressed
- If both achieve good ratios, prefer LZ4 (speed) unless Zstd is > 20% better (ratio)
- Cache the decision per-file for subsequent blocks
Encryption
HelixFS supports per-file and per-volume encryption using AES-256-GCM (Authenticated Encryption with Associated Data).
Key Management
Encryption Modes
| Mode | Scope | Key Management |
|---|---|---|
| Per-volume | Entire filesystem | Single master key, derived from passphrase |
| Per-file | Individual files | Per-file key, wrapped by master key |
| Per-directory | Directory tree | Shared key for all files in subtree |
Integrity
Every block includes a checksum verified on read:
- Unencrypted: CRC32 or xxHash
- Encrypted: AES-GCM authentication tag (128-bit)
A checksum mismatch indicates data corruption and is reported as an I/O error.
VFS Layer
The Virtual Filesystem layer provides a POSIX-like interface that the rest of the kernel uses:
File Operations
Mount System
Caching
HelixFS uses a four-level cache hierarchy:
| Cache | Scope | Eviction | Purpose |
|---|---|---|---|
| Page Cache | Data blocks | LRU | Recently read/written file data |
| Inode Cache | Metadata | LRU | Recently accessed file metadata |
| Buffer Cache | Raw blocks | LRU | Disk block caching |
| ARC | B-tree nodes | Adaptive | B-tree node caching (scan-resistant) |
The ARC (Adaptive Replacement Cache) balances between recent and frequent access patterns, preventing cache pollution from sequential scans.