One of Helix's most distinctive features is the ability to replace kernel modules at runtime without rebooting, combined with an automated self-healing system that detects failures and attempts recovery. Together, these features mean the kernel can fix itself — a crashed scheduler gets restarted, a buggy driver gets hot-swapped with a fresh version, all while the rest of the system keeps running.
Any module that supports live replacement must implement this trait. The key methods are export_state and import_state — they serialize/deserialize the module's runtime state so it can be transferred to the replacement.
The registry manages slots — named positions where a module can be loaded. Each slot holds one active module at a time. The hot_swap method is the core operation — it atomically replaces the old module with a new one, transferring state in the process.
The hot_swap method follows a safe 5-step protocol. If step 3 (init) fails, the system rolls back to the old module. If state migration fails in step 4, the new module starts fresh.
The self-healing manager monitors all registered components and automatically attempts recovery when failures are detected. It uses an escalating recovery strategy — simple restart first, then hot-swap, then isolation, and finally escalation.
core/src/selfheal.rs
2152rust
2 refs
pubstructSelfHealingManager{/* ... */}
2
2 refs
implSelfHealingManager{
4
pubconstfnnew()->Self;// No config needed
pubfnregister(&self,slot_id:SlotId,// Register by slot
The most common hot-reload use case is swapping schedulers at runtime. This lets you change scheduling strategy without rebooting — switch from round-robin to priority-based when the workload changes.
core/src/hotreload/schedulers.rs
2rust
1
// Built-in scheduler implementations
2 refs
pubstructRoundRobinScheduler{/* ... */}
2 refs
pubstructPriorityScheduler{/* ... */}
4
5
// Both implement HotReloadableModule + Scheduler
6
7
// Create a slot for the scheduler (category only)
The combination of hot-reload and self-healing means Helix can survive failures that would crash a traditional kernel. A buggy driver gets isolated, a fresh copy is loaded, state is restored, and users never notice the 100ms hiccup. This is the foundation for Helix's reliability story.