HelixFS

Zero-dependency filesystem with CoW, journaling, B+Tree indexing, and ARC caching.

Documentation

HelixFS Overview

HelixFS (helix-fs, 65 source files, ~12,000 lines) is Helix's native filesystem, built from scratch with zero external dependencies. It uses a Copy-on-Write B-tree architecture inspired by Btrfs and ZFS, providing instant snapshots, data integrity, and compression.

Key Features

FeatureImplementation
Copy-on-WriteAll writes create new blocks — originals preserved
B-tree indexingSelf-balancing tree for O(log n) lookups
JournalingWrite-Ahead Log (WAL) for crash recovery
SnapshotsInstant, space-efficient snapshots via CoW
CompressionAdaptive — LZ4 (fast) or Zstd (ratio)
EncryptionAES-256-GCM AEAD, per-file or per-volume
Extent-based allocationContiguous block ranges for large files
VFS layerPOSIX-like interface for the rest of the kernel

Constants

fs/src/lib.rs
rust
pub const HFS_MAGIC: u64 = 0x3153_4658_494C_4548; // "HELIXFS1" (little-endian)
pub const BLOCK_SIZE: usize = 4096; // 4 KB pages
pub const MAX_NAME_LEN: usize = 255; // Filename limit
pub const MAX_FILE_SIZE: u64 = 1 << 60; // 1 EB practical limit
pub const SUPERBLOCK_REPLICAS: usize = 8; // Primary replicated 8x
pub const ROOT_INO: u64 = 1; // Root directory inode
Index

Module Architecture

HelixFS — Module Architecture14N · 13E
helix-fs/src/HelixFS source tree13lib.rsPublic API, mount/un…1core/Primitives1disk/On-disk structures1tree/B-tree engine1alloc/Block allocation1vfs/VFS abstraction1ops/File operations1cache/Caching layers1journal/Journaling / WAL1compress/Compression1crypto/Encryption1snapshot/Snapshots1api/High-level API1
100%
☝ Drag to pan·🤏 Pinch to zoom·Tap a node

Disk Layout

HelixFS — Disk Layout5N · 4E
Boot Sector0 – 64 KB1Superblock64 KB offset2Journal / WALVariable size2Allocation GroupsBitmap + freelist2Data BlocksB-tree nodes + file …1
100%
☝ Drag to pan·🤏 Pinch to zoom·Tap a node

Superblock

fs/src/superblock.rs
rust
1
#[repr(C)]
pub struct Superblock {
3
pub magic: u64, // HFS_MAGIC
4
pub version: u32,
5
pub block_size: u32, // Always 4096
6
pub total_blocks: u64,
7
pub free_blocks: u64,
8
pub root_tree_root: u64, // Root of the fs B-tree
9
pub inode_tree_root: u64,
10
pub journal_start: u64,
11
pub journal_size: u64,
12
pub features_compat: u32, // Compatible feature flags
13
pub features_incompat: u32, // Incompatible feature flags
14
pub features_ro_compat: u32, // Read-only compatible
15
pub checksum: u32,
16
pub mount_count: u32,
17
pub mount_time: u64,
18
pub uuid: [u8; 16],
19
pub label: [u8; 32],
20
// ... ~20 more fields (crypto, snapshots, etc.)
21
}
Index

Inode

fs/src/inode.rs
rust
1
#[repr(C)]
pub struct OnDiskInode {
3
pub ino: u64, // Inode number
4
pub mode: u32, // File type + permissions
5
pub uid: u32,
6
pub gid: u32,
7
pub size: u64, // File size in bytes
8
pub blocks: u64, // Allocated blocks
9
pub atime: u64,
10
pub mtime: u64,
11
pub ctime: u64,
12
pub crtime: u64, // Creation time
13
pub nlink: u32, // Hard link count
14
pub flags: u32, // InodeFlags bits
15
pub generation: u32,
16
pub inline_extent_count: u32,
17
pub checksum: u32,
18
// Inline extents follow in the remaining 256-byte record
19
}
Index

B-tree Engine

The B-tree is the core data structure of HelixFS. All metadata — directory entries, extent maps, free space — is stored in B-trees.

Node Structure

fs/src/btree.rs
rust
pub enum BTreeNode {
2
Internal {
3
keys: Vec<Key>,
4
children: Vec<BlockId>, // Pointers to child nodes
5
level: u32,
6
},
7
Leaf {
8
entries: Vec<(Key, Value)>, // Key-value pairs
9
next: Option<BlockId>, // Sibling link for range scans
10
},
11
}
Index

Each node fits in a single NODE_SIZE (16 KB) block. With an average key size of 32 bytes, an internal node can hold ~500 keys, giving a branching factor of ~500.

Operations

OperationComplexityDescription
SearchO(log n)Binary search within nodes, tree traversal
InsertO(log n)Insert + split if node is full
DeleteO(log n)Delete + merge if node is underfull
Range scanO(log n + k)Find start, then follow sibling links

Copy-on-Write

When modifying a B-tree node:

  1. Allocate a new block for the modified node
  2. Copy the node contents and apply the modification
  3. Update the parent pointer to the new block
  4. The old block remains unchanged (referenced by snapshots)

This cascades up to the root — each modification creates a new path from the modified leaf to a new root. The old root is preserved as the snapshot root.


Journaling

HelixFS uses a Write-Ahead Log (WAL) for crash consistency.

Transaction Model

fs/src/journal.rs
rust
pub struct Transaction {
2
pub id: TransactionId,
3
pub records: Vec<JournalRecord>,
4
pub state: TxnState,
5
}
6
2 refs
pub enum TxnState {
8
Active, // Writes in progress
9
Committed, // All records written to WAL
10
Checkpointed, // Changes flushed to main area
11
Aborted, // Rolled back
12
}
13
2 refs
pub enum JournalRecord {
15
BlockWrite { block_id: u64, data: Vec<u8> },
16
InodeUpdate { inode: Inode },
17
Allocation { block_id: u64, count: u64 },
18
Deallocation { block_id: u64, count: u64 },
19
TreeModify { node_id: u64, operation: TreeOp },
20
}
Index

Write Flow

HelixFS — Journaled Write Flow6N · 5E
fsyncbackgroundApplication WriteUser issues write re…1Begin TransactionAllocate Transaction…2Write to WALSequential, fast2Commit TransactionWrite commit record …2AcknowledgeData is durable2Background: CheckpointFlush WAL → data are…1
100%
☝ Drag to pan·🤏 Pinch to zoom·Tap a node

Crash Recovery

On mount after an unclean shutdown:

  1. Read the WAL from the journal area
  2. Find the last committed transaction
  3. Replay all committed records that weren't checkpointed
  4. Discard any uncommitted (active/aborted) transactions
  5. Resume normal operation

Snapshots

Snapshots leverage the CoW B-tree to capture the filesystem state at a point in time.

Creating a Snapshot

fs/src/snapshot.rs
rust
impl SnapshotManager {
pub fn create(&mut self, name: &str) -> Result<SnapshotId>;
pub fn delete(&mut self, id: SnapshotId) -> Result<()>;
pub fn list(&self) -> Vec<SnapshotInfo>;
pub fn diff(&self, a: SnapshotId, b: SnapshotId) -> Result<Vec<DiffEntry>>;
pub fn rollback(&mut self, id: SnapshotId) -> Result<()>;
7
}
Index

Creating a snapshot is instantaneous — it simply records the current B-tree root. Since all writes use CoW, the snapshot's data is never modified.

Snapshot Tree

Snapshots form a tree structure (parent-child relationships):

Snapshot Tree — Parent-Child Relationships6N · 5E
root-snapshotBase filesystem stat…3daily-2024-01-01Daily snapshot3hourly-10amHourly snapshot1hourly-11amHourly snapshot1daily-2024-01-02Daily snapshot1before-upgradePre-upgrade snapshot1
100%
☝ Drag to pan·🤏 Pinch to zoom·Tap a node

Deleting a snapshot only frees blocks that are not referenced by any other snapshot.


Compression

HelixFS supports transparent per-block compression with two algorithms:

Algorithms

AlgorithmSpeedRatioUse Case
LZ4~500 MB/s compress, ~1.5 GB/s decompress2:1 typicalDefault — fast for interactive use
Zstd~100 MB/s compress, ~500 MB/s decompress3:1+ typicalArchival data, cold storage

Adaptive Selection

The AdaptiveCompressor automatically selects the best algorithm per block:

  1. Compress the block with both LZ4 and Zstd
  2. If neither achieves > 12.5% reduction, store uncompressed
  3. If both achieve good ratios, prefer LZ4 (speed) unless Zstd is > 20% better (ratio)
  4. Cache the decision per-file for subsequent blocks

Encryption

HelixFS supports per-file and per-volume encryption using AES-256-GCM (Authenticated Encryption with Associated Data).

Key Management

fs/src/encryption.rs
rust
pub struct KeyManager {
pub fn derive_key(&self, passphrase: &[u8], salt: &[u8]) -> Key256;
pub fn wrap_key(&self, key: &Key256, wrapping_key: &Key256) -> WrappedKey;
pub fn unwrap_key(&self, wrapped: &WrappedKey, wrapping_key: &Key256) -> Result<Key256>;
5
}
Index

Encryption Modes

ModeScopeKey Management
Per-volumeEntire filesystemSingle master key, derived from passphrase
Per-fileIndividual filesPer-file key, wrapped by master key
Per-directoryDirectory treeShared key for all files in subtree

Integrity

Every block includes a checksum verified on read:

  • Unencrypted: CRC32 or xxHash
  • Encrypted: AES-GCM authentication tag (128-bit)

A checksum mismatch indicates data corruption and is reported as an I/O error.


VFS Layer

The Virtual Filesystem layer provides a POSIX-like interface that the rest of the kernel uses:

File Operations

fs/src/vfs.rs
rust
pub trait VfsOps {
fn open(&self, path: &str, flags: OpenFlags) -> Result<FileHandle>;
fn close(&self, handle: FileHandle) -> Result<()>;
fn read(&self, handle: FileHandle, buf: &mut [u8], offset: u64) -> Result<usize>;
fn write(&self, handle: FileHandle, buf: &[u8], offset: u64) -> Result<usize>;
fn stat(&self, path: &str) -> Result<FileStat>;
fn mkdir(&self, path: &str, mode: u32) -> Result<()>;
fn rmdir(&self, path: &str) -> Result<()>;
fn unlink(&self, path: &str) -> Result<()>;
fn rename(&self, old: &str, new: &str) -> Result<()>;
fn readdir(&self, path: &str) -> Result<Vec<DirEntry>>;
fn symlink(&self, target: &str, link: &str) -> Result<()>;
fn readlink(&self, path: &str) -> Result<String>;
14
}
Index

Mount System

fs/src/mount.rs
rust
pub struct MountManager {
pub fn mount(&mut self, device: &str, mountpoint: &str,
3
fstype: &str, flags: MountFlags) -> Result<()>;
pub fn unmount(&mut self, mountpoint: &str, flags: UnmountFlags) -> Result<()>;
pub fn list_mounts(&self) -> Vec<MountInfo>;
6
}
Index

Caching

HelixFS uses a four-level cache hierarchy:

CacheScopeEvictionPurpose
Page CacheData blocksLRURecently read/written file data
Inode CacheMetadataLRURecently accessed file metadata
Buffer CacheRaw blocksLRUDisk block caching
ARCB-tree nodesAdaptiveB-tree node caching (scan-resistant)

The ARC (Adaptive Replacement Cache) balances between recent and frequent access patterns, preventing cache pollution from sequential scans.