Hot-Reload & Self-Healing

Live module swap with state migration, escalating recovery, and self-healing manager.

Profile Reference

Hot-Reload & Self-Healing

One of Helix's most distinctive features is the ability to replace kernel modules at runtime without rebooting, combined with an automated self-healing system that detects failures and attempts recovery. Together, these features mean the kernel can fix itself — a crashed scheduler gets restarted, a buggy driver gets hot-swapped with a fresh version, all while the rest of the system keeps running.

HotReloadableModule Trait

Any module that supports live replacement must implement this trait. The key methods are export_state and import_state — they serialize/deserialize the module's runtime state so it can be transferred to the replacement.

core/src/hotreload/mod.rs

1101rust

pub trait HotReloadableModule: Send + Sync {

  fn name(&self) -> &'static str;

  fn version(&self) -> ModuleVersion;

  fn category(&self) -> ModuleCategory;

  // Lifecycle

  fn init(&mut self) -> Result<(), HotReloadError>;

  fn prepare_unload(&mut self) -> Result<(), HotReloadError>;

  // Hot-reload protocol — these enable live replacement

  fn export_state(&self) -> Option<Box<dyn ModuleState>>;  // Snapshot state

  fn import_state(&mut self, state: &dyn ModuleState)       // Restore state

      -> Result<(), HotReloadError>;

  fn can_unload(&self) -> bool;           // Safe to remove right now?

  fn as_any(&self) -> &dyn Any;           // Downcast support

  fn as_any_mut(&mut self) -> &mut dyn Any;

}

#[repr(u8)]

2 refs

pub enum ModuleCategory {

  Scheduler = 0,  MemoryAllocator = 1, Filesystem = 2,

  Driver    = 3,  Network = 4,         Security   = 5,

  Ipc       = 6,  Custom  = 255,

}

Index

Hot-Reload Registry

The registry manages slots — named positions where a module can be loaded. Each slot holds one active module at a time. The hot_swap method is the core operation — it atomically replaces the old module with a new one, transferring state in the process.

core/src/hotreload/mod.rs

1181rust

2 refs

pub struct HotReloadRegistry { /* ... */ }

2 refs

impl HotReloadRegistry {

  pub fn create_slot(&self, category: ModuleCategory) -> SlotId;

  pub fn load_module(&self, slot: SlotId,

      module: Box<dyn HotReloadableModule>) -> Result<(), HotReloadError>;

  pub fn unload_module(&self, slot: SlotId) -> Result<(), HotReloadError>;

  pub fn slot_status(&self, slot: SlotId) -> Option<SlotStatus>;

  // Core operation — atomic live swap with state transfer

  pub fn hot_swap(&self, slot: SlotId,

      new: Box<dyn HotReloadableModule>) -> Result<(), HotReloadError>;

  // Access the active module safely (typed via downcast)

  pub fn with_module<T: 'static, F, R>(&self, slot: SlotId, f: F) -> Option<R>

      where F: FnOnce(&T) -> R;

  pub fn with_module_mut<T: 'static, F, R>(&self, slot: SlotId, f: F) -> Option<R>

      where F: FnOnce(&mut T) -> R;

  pub fn list_slots(&self)

      -> Vec<(SlotId, ModuleCategory, SlotStatus, Option<&'static str>)>;

}

#[repr(u8)]

3 refs

pub enum SlotStatus {

  Empty = 0, Loading = 1, Active = 2,

  Unloading = 3, Swapping = 4, Failed = 5,

}

Index

Hot-Swap Protocol

The hot_swap method follows a safe 5-step protocol. If step 3 (init) fails, the system rolls back to the old module. If state migration fails in step 4, the new module starts fresh.

Hot-Swap Protocol7N · 7E

Minimap100%

100%

☝ Drag to pan·🤏 Pinch to zoom·Tap a node

Ctrl+FSearch

PPath

SStats

FFullscreen

EExport

Shift+DragMove node

↑↓Navigate

+/−Zoom

Step	Action	On Failure
1	Call `old.export_state()` — snapshot runtime state	Continue without state
2	Call `old.prepare_unload()` — drain pending work	Continue (committed to swap)
3	Call `new.init()` — initialize new module	Rollback — restore old module
4	Call `new.import_state(state)` — migrate state	Continue (new module starts fresh)
5	Activate new module — slot becomes `Active`	—

Self-Healing Manager

The self-healing manager monitors all registered components and automatically attempts recovery when failures are detected. It uses an escalating recovery strategy — simple restart first, then hot-swap, then isolation, and finally escalation.

core/src/selfheal.rs

2152rust

2 refs

pub struct SelfHealingManager { /* ... */ }

2 refs

impl SelfHealingManager {

  pub const fn new() -> Self;             // No config needed

  pub fn register(&self, slot_id: SlotId, // Register by slot

      name: &str, factory: Option<ModuleFactory>);

  pub fn report_failure(&self, slot_id: SlotId);

2 refs

  pub fn tick(&self);                     // Called on timer

  pub fn stats(&self) -> RecoveryStats;

  pub fn events(&self) -> Vec<RecoveryEvent>;

}

#[repr(u8)]

pub enum HealthStatus {

  Healthy      = 0,  // Responding normally

  Degraded     = 1,  // Functional but impaired

  Unresponsive = 2,  // Potential hang

  Crashed      = 3,  // Module crashed

  Recovering   = 4,  // Recovery in progress

  Unknown      = 255, // Not monitored

}

pub enum RecoveryAction {

  None,       // No action needed

  Restart,    // Re-init the module in-place

  Failover,   // Replace via factory function

  Panic,      // Unrecoverable — escalate to kernel panic

}

2 refs

pub struct RecoveryEvent {

  pub tick: u64,

  pub slot_id: SlotId,

  pub module_name: String,

  pub event_type: RecoveryEventType,

  pub success: bool,

}

Index

Escalating Recovery Strategy

The manager tracks how many times each component has crashed and escalates the recovery strategy automatically.

Escalating Recovery5N · 4E

Minimap100%

100%

☝ Drag to pan·🤏 Pinch to zoom·Tap a node

Ctrl+FSearch

PPath

SStats

FFullscreen

EExport

Shift+DragMove node

↑↓Navigate

+/−Zoom

Crash Count	Action	What Happens
1st	`Restart`	Re-init the module in place
2nd	`Restart`	Second attempt
3rd	`Failover`	Replace via factory function (fresh instance)
4th+	`Panic`	Unrecoverable — escalate to kernel panic handler

Integration Example

Here's how the self-healing system integrates with the rest of the kernel in practice.

core/src/selfheal.rs

rust

// During kernel initialization

let heal = SelfHealingManager::new(); // const fn, no config needed

// Register modules by slot ID + optional factory for replacement

heal.register(sched_slot, "scheduler", Some(|| Box::new(RoundRobin::new())));

heal.register(fs_slot, "filesystem", None); // No auto-replacement

// In the timer interrupt handler:

// heal.tick() runs automatically and:

//   1. Checks health of each registered slot

//   2. If failed → attempt recovery (Restart / Failover / Panic)

//   3. Logs RecoveryEvents and updates stats

// Query recovery stats at any time

let stats = heal.stats();

log::info!("Health: {}%, recoveries: {}/{}",

  stats.system_health,

  stats.successful_recoveries,

  stats.failures_detected,

);

Live Scheduler Swap Example

The most common hot-reload use case is swapping schedulers at runtime. This lets you change scheduling strategy without rebooting — switch from round-robin to priority-based when the workload changes.

core/src/hotreload/schedulers.rs

2rust

// Built-in scheduler implementations

2 refs

pub struct RoundRobinScheduler { /* ... */ }

2 refs

pub struct PriorityScheduler { /* ... */ }

// Both implement HotReloadableModule + Scheduler

// Create a slot for the scheduler (category only)

let slot = registry.create_slot(ModuleCategory::Scheduler);

// Load the initial scheduler

registry.load_module(slot, Box::new(RoundRobinScheduler::new()))?;

// Later — swap to priority scheduling at runtime

// State is exported via ModuleState, migrated to the new module

registry.hot_swap(slot, Box::new(PriorityScheduler::new()))?;

Index

The combination of hot-reload and self-healing means Helix can survive failures that would crash a traditional kernel. A buggy driver gets isolated, a fresh copy is loaded, state is restored, and users never notice the 100ms hiccup. This is the foundation for Helix's reliability story.

Hot-Reload & Self-Healing#

HotReloadableModule Trait#

Hot-Reload Registry#

Hot-Swap Protocol#

Self-Healing Manager#

Escalating Recovery Strategy#

Integration Example#

Live Scheduler Swap Example#

Hot-Reload & Self-Healing

HotReloadableModule Trait

Hot-Reload Registry

Hot-Swap Protocol

Self-Healing Manager

Escalating Recovery Strategy

Integration Example

Live Scheduler Swap Example