How the Life Savor Runtime Works — Life Savor Engineering Blog

The Life Savor agent is a single binary that runs on your machine. It connects to the platform, manages AI models, executes skills, and keeps everything private by default. Here's how it all fits together.

Zero Built-In Models

The agent ships with no model providers baked in. Every model — whether it's GPT-4o via API or a local LLaMA running on your GPU — is delivered as an installable component from the marketplace.

This means the base agent is small and fast to install. You add only the capabilities you need.

The Component Architecture

When you install a model component, it registers with the agent's Integration Registry. The registry tracks each provider's lifecycle: registration, health status, capabilities, and routing.

When you send a message to your agent, the Inference Bridge routes your request to the right model provider based on what's installed and healthy. If you have a local model loaded, it goes there. If you're using a cloud API, it routes through the gateway.

You → Agent → Inference Bridge → Registered Model Provider → Response

Three Ways to Run a Model

Model components follow one of three patterns:

Pattern	How it works	Example
API Gateway	Routes through the Life Savor API for centralized billing	GPT-4o, Claude, Gemini
Local	Runs on your hardware via PyTorch or ONNX Runtime	LLaMA 3, Mistral 7B, Phi-3
BYOK	Uses your own API key, calls the vendor directly	GPT-4o-BYOK, Claude-BYOK

Local Model Execution

For local models, the agent includes a dual-runtime engine supporting both PyTorch and ONNX Runtime. It automatically detects your hardware — CUDA on NVIDIA GPUs, Metal Performance Shaders on Apple Silicon, DirectML on Windows — and picks the fastest path.

Models move through three states:

Hot — loaded in memory, ready for instant inference
Warm — partially loaded, fast to activate
Cold — on disk, needs to be loaded before use

The agent manages these transitions automatically based on usage patterns and available memory.

Skill Execution

Skills are sandboxed extensions that give your agent new capabilities — think of them as tools. A skill might fetch your calendar, summarize a document, or query a database.

When the agent invokes a skill, it:

Spawns the skill as a child process
Sends a JSON request to the skill's stdin
Reads the JSON response from stdout
Enforces a timeout (30 seconds by default)
Caps output at 1MB to prevent runaway processes

The skill runs in a restricted environment — no inherited environment variables, limited filesystem access, bounded output. If it misbehaves, the agent terminates it cleanly (SIGTERM first, then SIGKILL after a grace period).

Skills can be written in any language. The agent doesn't care if it's Rust, Node.js, or Python — it only cares about the JSON protocol over stdin/stdout.

Assistants: Orchestration on Top

Assistants sit above skills and models. They combine a system prompt, tool bindings, safety guardrails, and workflow logic into a coherent persona.

An assistant might:

Use a summarization skill to condense a long email
Call a calendar skill to check your availability
Route through a local model for privacy-sensitive queries
Hand off to a human if it can't resolve something

Assistants use a finite state machine internally, so they can handle multi-step workflows, wait for user confirmation, recover from errors, and track progress — all without you writing orchestration code.

Security by Default

Every model file is verified with SHA-256 checksums before loading. The manifest records the expected hash at publish time, and the agent checks it at install, startup, and upgrade. If a file has been tampered with, it's rejected immediately.

Skills run in a sandbox with no access to your environment variables, secrets, or filesystem beyond what's explicitly declared. Secrets are stored in an encrypted vault on disk, never passed to child processes.

What's Next

In the next post, we'll walk through building your first skill — a simple tool that your agent can invoke, written in about 30 lines of code.