What you get
- OpenAI-compatible API at `http://localhost:8080/v1`
- One-command setup for trying Gemma, Llama, Phi, and more locally
- A fast path for testing prompts, benchmarks, and local-first behavior
Start an OpenAI-compatible local inference server on macOS, Linux, or Windows. When you outgrow a single machine, Octomil gives you a path into routing, staged rollouts, and fleet management across real devices.
Use local inference for single-machine work. Move up to the platform when you need routing, device rollout control, fallback policy, or fleet visibility.
Use Octomil when you need a local inference server that behaves like the OpenAI API and lets you test local-first product flows fast.
See local inference docs →When you need staged rollouts, fallback policies, observability, and security review support, hand off from the CLI to the Octomil platform.
Request a platform demo →If your use case stops at a single laptop, you may not need rollout or fleet controls yet. Octomil still gives you that upgrade path when the project leaves the prototype stage.
See the OpenAI-compatible setup →