hyle: Salience-Aware Context Management for Autonomous Code Assistants

A Rust-native implementation with multi-tier compression and cognitive architecture

Technical Report · v0.3.3 · 2024

Abstract. We present hyle, an autonomous code assistant implemented in Rust that addresses context window limitations through salience-aware tiered compression. The system employs a multi-model cognitive architecture where specialized models handle distinct phases of the execution loop. We describe the agentic execution model, safety mechanisms, and session persistence layer. Empirical evaluation shows the system maintains coherent multi-file refactoring sessions while staying within token budgets. The implementation passes 364 tests covering safety invariants, tool execution, and context management.

Introduction

Large language models excel at code generation but struggle with extended development sessions. Context windows, while expanding (from 4K to 200K+ tokens), remain finite. Previous approaches either truncate history aggressively—losing critical context about design decisions—or maintain full context at prohibitive cost.

hyle addresses this through salience-aware context management: a four-tier system that prioritizes recent, error-containing, and task-relevant content while compressing or discarding peripheral information. The key insight is that not all context is equally valuable: a compilation error from 30 seconds ago is more salient than a successful file read from 10 minutes ago.

This paper makes the following contributions:

System Architecture

The system is organized into six primary modules. Click on any component in the diagram below to see its detailed responsibilities.

┌─────────────────────────────────────────────────────────────┐ │ main.rs │ │ CLI parsing, dispatch │ └───────────────┬─────────────────────────────┬───────────────┘ │ │ ┌───────────▼───────────┐ ┌──────────▼──────────┐ │ ui.rs │ │ server.rs │ │ TUI event loop │ │ HTTP API │ └───────────┬───────────┘ └──────────┬──────────┘ │ │ ┌───────────▼──────────────────────────────▼──────────┐ │ agent.rs │ │ Tool parsing, execution loop │ └───────────┬─────────────────────────────┬───────────┘ │ │ ┌───────────▼───────────┐ ┌──────────▼──────────┐ │ tools.rs │ │ client.rs │ │ File ops, bash │ │ OpenRouter SSE │ └───────────────────────┘ └─────────────────────┘
Figure 1: Module dependency graph. Arrows indicate function calls.

main.rs - Entry Point

Parses CLI arguments using clap, initializes logging, and dispatches to either TUI mode or HTTP server mode. Handles configuration loading from ~/.config/hyle/config.json and environment variable overrides.

pub fn main() -> Result<()> {
    let cli = Cli::parse();
    init_logging(cli.verbose)?;

    match cli.mode {
        Mode::Interactive => ui::run(cli.into())?,
        Mode::Server { port } => server::run(port)?,
        Mode::Once { prompt } => agent::run_once(&prompt)?,
    }
    Ok(())
}

ui.rs - Terminal User Interface

Implements a 20Hz event loop using ratatui. Non-blocking design polls keyboard input, background task completion, and API streaming simultaneously. Maintains render state separately from application state for flicker-free updates.

loop {
    // Poll at 50ms intervals (~20Hz)
    if event::poll(Duration::from_millis(50))? {
        handle_input(event::read()?)?;
    }
    // Check background tasks
    while let Ok(msg) = bg_rx.try_recv() {
        process_background_result(msg)?;
    }
    // Render current state
    terminal.draw(|f| render_ui(f, &state))?;
}

agent.rs - Execution Loop

Core agentic loop that parses tool calls from model output, executes them, and feeds results back. Implements the "think-act-observe" cycle with configurable iteration limits and stuck detection.

while iterations < MAX_ITERATIONS {
    let response = client.complete(&messages).await?;

    if let Some(tools) = parse_tool_calls(&response) {
        let results = execute_tools(tools).await?;
        messages.push(tool_results_message(results));
    } else {
        // No tools = task complete
        break;
    }
    iterations += 1;
}

tools.rs - Tool Implementations

Five core tools: read, write, patch, bash, and search. All file operations use atomic semantics. Bash commands are checked against a blocklist before execution.

pub enum Tool {
    Read { path: PathBuf },
    Write { path: PathBuf, content: String },
    Patch { path: PathBuf, search: String, replace: String },
    Bash { command: String, timeout: Option },
    Search { pattern: String, path: Option },
}

client.rs - API Client

Server-Sent Events (SSE) streaming client for OpenRouter API. Handles rate limiting with exponential backoff, automatic model fallback, and token counting for budget management.

pub async fn stream_completion(
    &self,
    messages: &[Message],
) -> impl Stream> {
    let req = self.build_request(messages)?;

    reqwest::Client::new()
        .post(&self.endpoint)
        .headers(self.headers())
        .json(&req)
        .send()
        .await?
        .bytes_stream()
        .map(parse_sse_event)
}

server.rs - HTTP API

Optional REST API for IDE integrations. Exposes endpoints for session management, prompt submission, and status queries. Uses axum with tower middleware for request logging.

let app = Router::new()
    .route("/v1/chat", post(handle_chat))
    .route("/v1/sessions", get(list_sessions))
    .route("/v1/sessions/:id", get(get_session))
    .route("/health", get(|| async { "ok" }))
    .layer(TraceLayer::new_for_http());

2.1 Threading Model

The system uses a hybrid async/sync architecture. The main event loop is synchronous for predictable TUI timing, while API calls and file I/O use Tokio's async runtime via spawn_blocking.

Algorithm 1: Main Event Loop
loop {
  // Phase 1: Collect events (non-blocking)
  events ← poll_all_sources(timeout: 50ms)
  
  // Phase 2: Update state
  for event in events:
    state ← reduce(state, event)
  
  // Phase 3: Render (pure function of state)
  frame ← render(state)
  terminal.draw(frame)
}

Context Management

Context is allocated across four tiers based on computed salience scores. The allocation adapts dynamically based on task phase and error state.

TierBudgetContentCompression
Focus40%Current task, last tool results, errorsNone
Recent30%Last 2-3 exchanges, active decisionsLight
Summary20%Older exchanges, key factsHeavy
Background10%Project structure, conventionsMinimal

Definition 1: Salience Score

For a message m with age t (seconds since creation), the salience score S(m) is computed as:

S(m) = w_r · R(t) + w_e · E(m) + w_k · K(m) + w_f · F(m)

where:
  R(t) = exp(-t / τ)           // Recency: exponential decay, τ = 300s
  E(m) = 1 if contains_error(m)  // Error boost: errors are highly salient
  K(m) = |keywords(m) ∩ task|   // Keyword overlap with current task
  F(m) = 1 if references_focus() // References currently focused files

Default weights: w_r = 0.4, w_e = 0.3, w_k = 0.2, w_f = 0.1

Messages are sorted by salience score and allocated to tiers in descending order until each tier's token budget is exhausted.

Algorithm 2: Tiered Compression

fn compress_context(messages: Vec<Message>, budget: usize) -> Vec<Message> {
    let mut output = Vec::new();
    let mut remaining = budget;

    // Sort by salience
    let sorted = messages.sorted_by(|a, b| salience(b).cmp(&salience(a)));

    for msg in sorted {
        let tier = assign_tier(&msg, &output);
        let compressed = match tier {
            Tier::Focus => msg.clone(),  // No compression
            Tier::Recent => light_compress(&msg),
            Tier::Summary => summarize(&msg),
            Tier::Background => extract_facts(&msg),
        };

        let tokens = count_tokens(&compressed);
        if tokens <= remaining {
            output.push(compressed);
            remaining -= tokens;
        }
    }
    output
}

3.1 Compression Strategies

Each tier uses a different compression strategy optimized for its purpose:

Safety Mechanisms

Given the autonomous nature of the system, safety is paramount. We implement defense in depth with multiple layers.

4.1 Command Blocklist

const BLOCKED_PATTERNS: &[&str] = &[
    "rm -rf /", "rm -r /", "rm --recursive /",
    ":(){ :|:& };:",           // fork bomb
    "dd if=/dev/zero",         // disk overwrite
    "dd if=/dev/random",
    "mkfs.",                   // filesystem format
    "chmod -R 777 /",          // permission disasters
    "> /dev/sda",              // direct disk write
    "curl | sh", "wget | sh",  // remote code execution
    "curl | bash", "wget | bash",
];

4.2 Atomic File Operations

All file writes follow an atomic protocol to prevent partial writes and enable recovery:

Algorithm 3: Atomic Write Protocol
function atomic_write(path, content):
  // 1. Write to temporary file
  temp ← path + ".tmp." + random_hex(8)
  write_file(temp, content)
  
  // 2. Sync to disk
  fsync(temp)
  
  // 3. Create timestamped backup
  if exists(path):
    backup ← path + "." + timestamp() + ".bak"
    rename(path, backup)
  
  // 4. Atomic rename
  rename(temp, path)
  
  // 5. Verify write
  verify ← read_file(path)
  assert verify == content

4.3 Loop Detection

The validator model monitors for stuck states by comparing recent tool calls. If the same operation is attempted 3+ times without progress, the system surfaces a clarifying question.

Evaluation

MetricValueNotes
Test coverage364 testsUnit, integration, and property tests
Binary size~10MBRelease build, stripped
Startup time<250msCold start to interactive prompt
TUI refresh rate20Hz50ms polling interval
Memory usage~30MBIdle, single session
Supported models35+Via OpenRouter

5.1 Context Efficiency

In a 2-hour refactoring session involving 47 files, the salience-aware compression maintained task coherence while using only 38% of the naive full-context approach's token budget.

References

McIlroy, M.D. (1978). Unix Time-Sharing System: Foreword. Bell System Technical Journal.
Vaswani, A. et al. (2017). Attention Is All You Need. NeurIPS.
OpenRouter API Documentation. openrouter.ai/docs
hyle source repository. github.com/uprootiny/hyle
Suckless software philosophy. suckless.org/philosophy
§
Default Composable Velocity Reliable Depth Playful Observable Community Independent Learning Control Secure Flow