On-device AI for consumer apps

On-device AI inference. One command.

Run open models on phones, laptops, and browsers. Auto-selects the fastest engine for your hardware. Ship to users with staged rollouts and quality monitoring.

One command to run octomil serve starts inference on any hardware
13 engines, auto-selected MLX, llama.cpp, CoreML, MNN, ONNX Runtime, and 8 more
Ship without app updates OTA model delivery to phones, laptops, and browsers
Active Devices
Inferences Served
Models Deployed
Organizations
Measure mascot

Benchmark your model on real devices. Know what runs on-device and what needs cloud fallback.

Ship mascot

Deploy optimized models to phones with canary rollouts. Automatic rollback on quality regression.

Operations mascot

Monitor quality, latency, and cost savings across your fleet. Route traffic intelligently. A/B test models.

How it works

Run. Ship. Monitor.

1

Run

Start a local inference server with one command. octomil serve auto-detects your hardware and picks the fastest engine — MLX, llama.cpp, MNN, or CoreML.

2

Ship

Deploy models to user devices with octomil deploy. Auto-converts for CoreML and TFLite, rolls out to 10% first, and promotes on quality pass.

3

Monitor

Track quality, latency, and memory across your device fleet in the dashboard. Smart routing sends hard queries to the cloud. A/B test model versions.

Enterprise

Privacy-preserving training

For regulated industries: train models across devices without centralizing data.

9 FL strategies

FedAvg, FedProx, Scaffold, and 6 more. Production-tested aggregation algorithms that handle non-IID data, stragglers, and adversarial devices.

Privacy built in

Differential privacy, secure aggregation, gradient clipping. Data never leaves the device. Only anonymous model weight updates are transmitted.

Fleet management

A/B testing, canary rollouts, per-device telemetry. Deploy to 100+ devices with progressive rollout and automatic rollback on quality regression.

$ octomil train sentiment-v1 --strategy fedavg --rounds 50
# Models improve from real usage. Data stays on each device.

Platform

Everything you need to go on-device

Smart routing

Route common queries on-device and fall back to cloud for hard or rare inputs. Per-model routing rules, confidence thresholds, and automatic failover. Your cloud bill drops without quality degradation.

Staged rollouts

Deploy to 10% of your fleet first. Promote on quality pass, rollback automatically on regression. Canary deploys, A/B testing, and per-device targeting. Bad models never reach your full user base.

Observability

Latency, crash rates, memory, battery impact, and quality evals — per device, per model version. Compare on-device vs cloud performance. Know exactly what's happening across your fleet.

Quality guardrails

Run eval harnesses before and after deployment. Set accuracy thresholds per model. Automatic rollback when quality drops below your bar. Ship with confidence.

Drop-in SDK

Python, iOS, and Android SDKs with identical workflows. Five lines to run inference on-device. Ship to phones, tablets, and workstations from a single control plane.

Audit-ready compliance

Every model version, rollout decision, and policy change is logged with actor attribution. On-device inference means zero data collection. HIPAA, GDPR, SOC 2 ready by architecture.

Integrations

Works with your existing ML stack

PyTorch
TensorFlow
ONNX
CoreML
TFLite
iOS
Android
ollama

Use cases

Where on-device inference changes the math

Security and compliance

Compliant by architecture, not by policy

Data minimization

Raw training data never leaves end-user devices. Only anonymous model weight updates are transmitted to the aggregation server. No user data, no PII, no liability.

Differential privacy

Mathematically guaranteed privacy bounds on every model update. Individual user contributions cannot be reverse-engineered from the aggregated model.

Secure aggregation

Cryptographic protocols ensure the server only sees the combined model update, never individual device contributions. Even Octomil cannot inspect per-device gradients.

Compliance posture

HIPAA Architecture ready

We architecturally cannot violate HIPAA. Data never reaches Octomil servers. BAA available on Enterprise tier.

GDPR By design

No PII collection. On-device training with data minimization. Processing stays local.

SOC 2 Type II On roadmap

Architecture aligned to SOC 2 trust service criteria. Formal audit planned.

On-prem / VPC Enterprise

Run Octomil in your own data center or cloud account, within your network boundaries. Fully managed by our team.

Pricing

Start free. Scale when you need to.

Growth

From $499 / month

Fleet deploy for production workloads.

  • Up to 10,000 devices
  • Fleet deploy and OTA updates
  • A/B testing and experiments
  • SSO authentication
  • 90-day data retention
  • Email support
Enterprise

Custom

On-device routing, compliance, and SLAs.

  • Unlimited devices
  • Smart routing + cloud fallback
  • Federated learning (9 strategies)
  • VPC deployment
  • BAA available
  • Audit log export
  • Dedicated support + SLA

Get started

Deploy your first model in 60 seconds

Install the CLI, point it at your ollama model, and scan the QR code.

$ curl -fsSL https://octomil.com/install.sh | sh
$ octomil deploy gemma:2b --phone

Or tell us about your use case and we will get you set up.

Your cloud bill scales with every user

Move inference to their devices.

Smart routing, staged rollouts, quality monitoring. Cut cloud costs up to 90% without quality regression. Free for up to 10 devices.