How the Life Savor Runtime Works
The Life Savor agent is a single binary that runs on your machine. It connects to the platform, manages AI models, executes skills, and keeps everything private by default. Here's how it all fits together.
Zero Built-In Models
The agent ships with no model providers baked in. Every model — whether it's GPT-4o via API or a local LLaMA running on your GPU — is delivered as an installable component from the marketplace.
This means the base agent is small and fast to install. You add only the capabilities you need.
The Component Architecture
When you install a model component, it registers with the agent's Integration Registry. The registry tracks each provider's lifecycle: registration, health status, capabilities, and routing.
When you send a message to your agent, the Inference Bridge routes your request to the right model provider based on what's installed and healthy. If you have a local model loaded, it goes there. If you're using a cloud API, it routes through the gateway.
You → Agent → Inference Bridge → Registered Model Provider → Response
Three Ways to Run a Model
Model components follow one of three patterns:
| Pattern | How it works | Example |
|---|---|---|
| API Gateway | Routes through the Life Savor API for centralized billing | GPT-4o, Claude, Gemini |
| Local | Runs on your hardware via PyTorch or ONNX Runtime | LLaMA 3, Mistral 7B, Phi-3 |
| BYOK | Uses your own API key, calls the vendor directly | GPT-4o-BYOK, Claude-BYOK |
Local Model Execution
For local models, the agent includes a dual-runtime engine supporting both PyTorch and ONNX Runtime. It automatically detects your hardware — CUDA on NVIDIA GPUs, Metal Performance Shaders on Apple Silicon, DirectML on Windows — and picks the fastest path.
Models move through three states:
- Hot — loaded in memory, ready for instant inference
- Warm — partially loaded, fast to activate
- Cold — on disk, needs to be loaded before use
The agent manages these transitions automatically based on usage patterns and available memory.
Skill Execution
Skills are sandboxed extensions that give your agent new capabilities — think of them as tools. A skill might fetch your calendar, summarize a document, or query a database.
When the agent invokes a skill, it:
- Spawns the skill as a child process
- Sends a JSON request to the skill's stdin
- Reads the JSON response from stdout
- Enforces a timeout (30 seconds by default)
- Caps output at 1MB to prevent runaway processes
The skill runs in a restricted environment — no inherited environment variables, limited filesystem access, bounded output. If it misbehaves, the agent terminates it cleanly (SIGTERM first, then SIGKILL after a grace period).
Skills can be written in any language. The agent doesn't care if it's Rust, Node.js, or Python — it only cares about the JSON protocol over stdin/stdout.
Assistants: Orchestration on Top
Assistants sit above skills and models. They combine a system prompt, tool bindings, safety guardrails, and workflow logic into a coherent persona.
An assistant might:
- Use a summarization skill to condense a long email
- Call a calendar skill to check your availability
- Route through a local model for privacy-sensitive queries
- Hand off to a human if it can't resolve something
Assistants use a finite state machine internally, so they can handle multi-step workflows, wait for user confirmation, recover from errors, and track progress — all without you writing orchestration code.
Security by Default
Every model file is verified with SHA-256 checksums before loading. The manifest records the expected hash at publish time, and the agent checks it at install, startup, and upgrade. If a file has been tampered with, it's rejected immediately.
Skills run in a sandbox with no access to your environment variables, secrets, or filesystem beyond what's explicitly declared. Secrets are stored in an encrypted vault on disk, never passed to child processes.
What's Next
In the next post, we'll walk through building your first skill — a simple tool that your agent can invoke, written in about 30 lines of code.