Measure
Benchmark your model on real devices. Know what runs on-device and what needs cloud fallback.
On-device AI for consumer apps
Run open models on phones, laptops, and browsers. Auto-selects the fastest engine for your hardware. Ship to users with staged rollouts and quality monitoring.
Measure
Benchmark your model on real devices. Know what runs on-device and what needs cloud fallback.
Ship
Deploy optimized models to phones with canary rollouts. Automatic rollback on quality regression.
Operate
Monitor quality, latency, and cost savings across your fleet. Route traffic intelligently. A/B test models.
How it works
Start a local inference server with one command. octomil serve auto-detects your hardware and picks the fastest engine — MLX, llama.cpp, MNN, or CoreML.
Deploy models to user devices with octomil deploy. Auto-converts for CoreML and TFLite, rolls out to 10% first, and promotes on quality pass.
Track quality, latency, and memory across your device fleet in the dashboard. Smart routing sends hard queries to the cloud. A/B test model versions.
Enterprise
For regulated industries: train models across devices without centralizing data.
FedAvg, FedProx, Scaffold, and 6 more. Production-tested aggregation algorithms that handle non-IID data, stragglers, and adversarial devices.
Differential privacy, secure aggregation, gradient clipping. Data never leaves the device. Only anonymous model weight updates are transmitted.
A/B testing, canary rollouts, per-device telemetry. Deploy to 100+ devices with progressive rollout and automatic rollback on quality regression.
$ octomil train sentiment-v1 --strategy fedavg --rounds 50
# Models improve from real usage. Data stays on each device. Platform
Route common queries on-device and fall back to cloud for hard or rare inputs. Per-model routing rules, confidence thresholds, and automatic failover. Your cloud bill drops without quality degradation.
Deploy to 10% of your fleet first. Promote on quality pass, rollback automatically on regression. Canary deploys, A/B testing, and per-device targeting. Bad models never reach your full user base.
Latency, crash rates, memory, battery impact, and quality evals — per device, per model version. Compare on-device vs cloud performance. Know exactly what's happening across your fleet.
Run eval harnesses before and after deployment. Set accuracy thresholds per model. Automatic rollback when quality drops below your bar. Ship with confidence.
Python, iOS, and Android SDKs with identical workflows. Five lines to run inference on-device. Ship to phones, tablets, and workstations from a single control plane.
Every model version, rollout decision, and policy change is logged with actor attribution. On-device inference means zero data collection. HIPAA, GDPR, SOC 2 ready by architecture.
Integrations
Use cases
Inference costs are your top-3 line item. Route 80%+ of traffic on-device, keep cloud for the long tail. Your margins improve with every user instead of getting worse.
Get started LatencyAutocomplete, content moderation, camera features, voice commands. On-device inference eliminates network round trips. Works offline, works in elevators, works at scale.
Learn more Fleet1,000+ devices, one dashboard. Canary rollouts, A/B testing, per-device telemetry, automatic rollback. Ship model updates without app store releases.
Learn more EnterpriseHealthcare, finance, government. On-device inference means zero data collection. HIPAA, GDPR compliance by architecture. Federated learning for privacy-preserving model improvement.
Learn moreSecurity and compliance
Raw training data never leaves end-user devices. Only anonymous model weight updates are transmitted to the aggregation server. No user data, no PII, no liability.
Mathematically guaranteed privacy bounds on every model update. Individual user contributions cannot be reverse-engineered from the aggregated model.
Cryptographic protocols ensure the server only sees the combined model update, never individual device contributions. Even Octomil cannot inspect per-device gradients.
We architecturally cannot violate HIPAA. Data never reaches Octomil servers. BAA available on Enterprise tier.
No PII collection. On-device training with data minimization. Processing stays local.
Architecture aligned to SOC 2 trust service criteria. Formal audit planned.
Run Octomil in your own data center or cloud account, within your network boundaries. Fully managed by our team.
Pricing
$0
Deploy models to your own devices. No account needed.
octomil serve + local inferenceFrom $499 / month
Fleet deploy for production workloads.
Custom
On-device routing, compliance, and SLAs.
Get started
Install the CLI, point it at your ollama model, and scan the QR code.
$ curl -fsSL https://octomil.com/install.sh | sh
$ octomil deploy gemma:2b --phone Or tell us about your use case and we will get you set up.
Your cloud bill scales with every user
Smart routing, staged rollouts, quality monitoring. Cut cloud costs up to 90% without quality regression. Free for up to 10 devices.