Local vs Cloud Models: When to Use Each

Life Savor supports both local models (running on your hardware) and cloud models (via API). You're not locked into either — you can mix and match based on what you're doing. Here's how to think about the tradeoff.

When Local Makes Sense

Privacy-sensitive work. If you're processing medical records, legal documents, financial data, or anything you wouldn't want leaving your device — local models keep everything on your hardware. The PII interceptor adds another layer, but local inference means the data never hits a network at all.

Offline use. On a plane, in a tunnel, or somewhere with unreliable internet — local models work regardless. No API calls, no latency spikes, no timeouts.

Predictable latency. Local inference has consistent response times. No cold starts, no rate limits, no queue behind other users. Once a model is in hot state, responses are immediate.

Cost at scale. If you're making hundreds of requests per day, local inference is effectively free after the initial model download. No per-token billing.

When Cloud Makes Sense

Capability. The largest, most capable models (GPT-4o, Claude 3.7 Sonnet, Gemini 1.5 Pro) require more compute than most personal devices can provide. If you need frontier-level reasoning, cloud APIs deliver it.

Hardware constraints. Running a 70B parameter model requires significant GPU memory. If you're on a laptop without a dedicated GPU, smaller local models may not match the quality you need. Cloud models have no hardware requirements on your end.

Convenience. Cloud models are instant — no download, no setup, no disk space. Install the component and start using it.

The Hybrid Approach

Most users end up with a mix:

  • A local model for everyday tasks (quick questions, drafting, summarization)
  • A cloud model for complex reasoning (analysis, coding, research)
  • The agent routes automatically based on what's available and healthy

You can configure routing preferences — "prefer local when available, fall back to cloud" — or let the agent decide based on the task.

Hardware Guidelines for Local Models

Model Size RAM Needed Good For
1-3B (TinyLlama, Phi-3 Mini) 4-6 GB Simple tasks, edge devices
7-8B (Mistral 7B, LLaMA 3 8B) 8-12 GB General use, good quality
13-22B (CodeStral, DeepSeek Coder) 16-24 GB Specialized tasks, coding
70B+ (LLaMA 3 70B) 48+ GB Near-frontier quality, requires GPU

Quantized models (4-bit) cut memory requirements roughly in half with minimal quality loss. The marketplace has pre-quantized versions of popular models.

BYOK: A Middle Ground

If you want cloud model quality but don't want billing through the platform, BYOK (Bring Your Own Key) components let you use your own API key directly. Your requests go straight to the vendor — no platform middleman, no markup. You just pay the vendor directly.

Bottom Line

There's no wrong answer. Start with what's convenient, then optimize based on your actual usage patterns. The agent makes switching easy — install a new model component, and it's available immediately.