Digital Sovereignty
Local LLMs, VRAM Optimization, Llama 4, and the Sovereign Software Stack
Running language models locally can reduce exposure to third-party data retention and subpoena risk in some scenarios. Actual privacy posture depends on configuration, hardware, network setup, and threat model. Not security or legal advice.
GPU Options for Local AI (2026)
| GPU Model | VRAM | Bandwidth | Ideal Workload |
|---|---|---|---|
| NVIDIA RTX 3090 | 24GB GDDR6X | 936 GB/s | Budget large VRAM; 7B-30B models |
| NVIDIA RTX 4090 | 24GB GDDR6X | 1,008 GB/s | Proven baseline; 30B models |
| NVIDIA RTX 5090 | 32GB GDDR7 | 1,792 GB/s | Flagship; 70B models (quantized) |
| Mac Studio M4 Ultra | 512GB (Shared) | 819 GB/s | Ultra-large models (up to 405B) |
VRAM and Model Parameter Ratios
VRAM Requirements by Quantization
| Model Size | VRAM (FP16/Raw) | VRAM (Q4) | VRAM (Q2/IQ2) |
|---|---|---|---|
| 7B - 8B | 14GB - 16GB | 4GB - 5GB | 2GB - 3GB |
| 30B - 34B | 64GB+ | 19GB - 20GB | 10GB - 12GB |
| 70B | 140GB+ | 35GB - 40GB | 20GB - 22GB |
| 405B | 810GB+ | 200GB+ | 120GB+ |
Bandwidth and Throughput
Performance Benchmarks
| Setup | VRAM Total | Bandwidth | Speed (32B Q4) | Price (2026) |
|---|---|---|---|---|
| Single RTX 5090 | 32GB | 1,792 GB/s | 61 tok/s | $2,500 - $3,800 |
| Dual RTX 3090 (Used) | 48GB | 936 GB/s | 30 tok/s | $1,600 - $1,800 |
| Mac Studio M4 Max | 128GB | 400 GB/s | 40-60 tok/s | $3,500 - $5,000 |
| DGX Spark / H100 | 80GB | 2,000 GB/s | 150+ tok/s | $25,000+ |
The Sovereign Software Stack
Engine Layer: Ollama or Llama.cpp for managing GGUF-quantized models. Interface Layer: Open WebUI or LM Studio for ChatGPT-like front-end. Workflow Builder: n8n for self-hosted RAG pipelines. Character Framework: WAFT for interactive world models and dynamic AI characters.
Recommended models: dolphin-3.0-llama-4-8b for instruction accuracy without refusal; qwen-2.5-coder-32b for superior coding performance.
Offline AI for Healthcare
IMPORTANT: This section discusses general informational uses of local AI tools alongside, not in place of, professional medical care. Nothing here is medical advice. Always consult a qualified licensed clinician for any medical decision. If you may have a medical emergency, call your doctor or your local emergency number immediately. Healthcare costs vary widely; figures cited are illustrative. Local AI tools, where appropriate and used responsibly, may help with general health literacy, but they do not diagnose, treat, cure, or prevent any disease.
A local retrieval-augmented system over published medical literature can, in principle, support general health literacy and patient self-education. This is a description of an architecture, not a clinical recommendation. Such a system is not a medical device, has not been evaluated by any regulatory authority, and must never be used in place of a licensed clinician. In a medical emergency, contact emergency services.
Offline Healthcare AI Stack
| Layer | Tool | Purpose | Hardware Req. |
|---|---|---|---|
| Inference engine | Ollama / llama.cpp | Run quantized medical LLM | RTX 3090 or M2 Pro+ |
| Medical knowledge base | PubMed OA + Merck Manual + UpToDate (offline export) | RAG source corpus | ~50GB SSD |
| Vector database | ChromaDB or Qdrant (self-hosted) | Semantic search over corpus | 8GB RAM minimum |
| RAG orchestration | Anything-LLM or Open WebUI (RAG mode) | Query routing + context injection | Same machine |
| Wearable telemetry | Withings / Garmin local sync | Vital trends without cloud upload | Local WiFi only |
| Emergency reference | Where There Is No Doctor (offline PDF + embedded) | Field-level triage guide | Offline-first |
Open Models Some Explore for Reading Medical Literature
| Model | Parameters | Strength | General Reading Use |
|---|---|---|---|
| Med42-v2 (M42 Health) | 70B (Q4) | Trained on medical text | Summarizing public medical literature |
| BioMistral-7B | 7B (Q4) | Biomedical literature comprehension | Research summaries, literature lookup |
| Llama 3.1-70B | 70B (Q4) | General reasoning | General reading and study |
| OpenBioLLM-70B | 70B (Q4) | USMLE-level medical knowledge | Studying medical concepts |
These tools are for general health literacy and reading public literature only. They are NOT for triage, diagnosis, treatment, drug, or mental-health decisions, and are not a substitute for a licensed clinician. If you may have a medical emergency, call your doctor or your local emergency number immediately. Critical implementation notes: (1) All medical AI outputs are informational, not diagnostic. The stack must surface this disclaimer in the UI at every response. (2) The corpus must be version-controlled and updated quarterly - stale medical literature is worse than no literature. (3) For community deployments, run the stack on a dedicated machine accessible over local LAN - members can query from any device without internet. (4) Pair with a physical medical kit: tourniquets, wound closure strips, SAM splints, and a printed copy of "Where There Is No Doctor" - analog backup for power-out scenarios.
Local research tools can reduce information asymmetry, but they are not a substitute for a licensed medical professional. Always work with a qualified clinician for any medical decision.
Figures are approximate and illustrative. Any statistics, costs, or percentages in this chapter are one author's rough estimates drawn from public reporting and may be out of date or wrong; verify against current primary sources before relying on any of them. Any products, vendors, projects, or services named are referenced for information only: mentioning them is not an endorsement, recommendation, or affiliation, and this site receives no compensation for any link. Evaluate fit, safety, cost, and legality for your own situation, and consult qualified licensed professionals before acting.