Run LLMs locally with one command.

Start an OpenAI-compatible local inference server on macOS, Linux, or Windows. When you outgrow a single machine, Octomil gives you a path into routing, staged rollouts, and fleet management across real devices.

Install Octomil See fleet platform

What you get

OpenAI-compatible API at `http://localhost:8080/v1`
One-command setup for trying Gemma, Llama, Phi, and more locally
A fast path for testing prompts, benchmarks, and local-first behavior

When to use the platform

Use local inference for single-machine work. Move up to the platform when you need routing, device rollout control, fallback policy, or fleet visibility.

Compare Octomil vs Flower Estimate AI inference cost

One machine, one command

Use Octomil when you need a local inference server that behaves like the OpenAI API and lets you test local-first product flows fast.

See local inference docs →

Many devices, one platform

When you need staged rollouts, fallback policies, observability, and security review support, hand off from the CLI to the Octomil platform.

Request a platform demo →

Need a local-only alternative?

If your use case stops at a single laptop, you may not need rollout or fleet controls yet. Octomil still gives you that upgrade path when the project leaves the prototype stage.

See the OpenAI-compatible setup →