The control plane for on-device AI.

Octomil gives teams one place to route requests, roll out safely by cohort, manage cloud fallback, and review fleet behavior as traffic moves into production.

Request demo Run locally

Main.kt

import com.octomil.sdk.Octomil

val client = Octomil(publishableKey = "oct_p12....")
client.initialize()
val response = client.responses.create(
    model = "gemma3-1b",
    input = "What's the capital of Canada?"
)

println(response.outputText)

import Octomil

let client = Octomil(publishableKey: "oct_p12....")
try await client.initialize()
let response = try await client.responses.create(
    model: "gemma3-1b",
    input: "What's the capital of Canada?"
)

print(response.outputText)

from octomil import Octomil

client = Octomil(publishable_key="oct_p12....")
client.initialize()
response = client.responses.create(
    model="gemma3-1b",
    input="What's the capital of Canada?"
)

print(response.output_text)

import { Octomil } from "@octomil/sdk"

const client = new Octomil({ publishableKey: "oct_p12...." })
await client.initialize()
const response = await client.responses.create({
    model: "gemma3-1b",
    input: "What's the capital of Canada?",
})

console.log(response.outputText)

import { Octomil } from "@octomil/browser"

const client = new Octomil({ publishableKey: "oct_p12...." })
await client.initialize()
const response = await client.responses.create({
    model: "gemma3-1b",
    input: "What's the capital of Canada?",
})

console.log(response.outputText)

> The capital of Canada is Ottawa.

Take the shortest path to evaluation

Use the route that matches where your team is today.

Run locally

Validate the local path on your own hardware before you commit to rollout policy.

Open local setup

Compare with Flower

See the difference between a framework and a production control plane.

Read the comparison

Estimate spend

Model what happens when routine traffic stays on-device instead of going to cloud.

Open calculator

See what the router evaluates

Octomil makes the execution policy legible, so teams can inspect runtime preference, device targeting, rollout state, and fallback rules in one place.

Decision inputs for `mobile-default@r19`

Example policy fields for a routine mobile cohort.

Device cohort iPhone 17 Pro

Preferred runtime CoreML / Apple NPU

Policy version mobile-default@r19

Fallback rule Cloud only on constraint

A request like support-ticket summarization can inherit this policy, stay on-device by default, and still fall back safely when hardware or quality limits show up.

Operate one model path across the runtimes you ship

Package once, release by cohort, and keep the production path legible.

Packaging

One packaging path across CoreML, LiteRT, MLX, ONNX, and browser runtimes.

Release control

Roll out by cohort, hold changes, and roll back without rebuilding the model path.

Fallback

Keep routine inference local and use cloud only when hardware or quality requires it.

Keep the review path straightforward

When procurement, privacy, or security requirements show up, the platform should still feel manageable.

Enterprise identity

Support for SSO, SCIM, scoped access, and audit history.

Deployment posture

VPC and privacy-sensitive deployment options when review requirements show up.

Operational review

Security materials, deployment planning, and rollout review support.

Bring one capability and one representative request

We can walk through the model and runtime choices, rollout plan, and fallback policy with your actual workload.

Request demo Run locally Compare approaches