How I route work across local and hosted models without hard-coding model choice into every product feature.

Learning stageFoundation patternProof statusPublic foundation page exists; concrete routing proof still needs extraction.

Multimodel orchestration

What it is

Multimodel orchestration is the decision layer between an AI feature and the models that can answer it. The feature asks for a kind of work: fast classification, careful writing, code review, local private reasoning, or a fallback when a provider is down. The orchestrator chooses the route, records the reason, and keeps the feature from caring whether the answer came from a local Spark-hosted model, a hosted frontier model, or a cheaper utility model.

Learning goal

Learn to design a model strategy that has more than one viable path. The skill is knowing when to use the strongest model, when to use a cheaper or local model, and when to use a deterministic tool instead of a model at all.

Why it matters in production

Hard-coding one model into every workflow makes the system brittle. A provider outage becomes a product outage. A privacy-sensitive request can accidentally leave the local boundary. A cheap background job can consume an expensive model because nobody separated task intent from model selection.

Orchestration is the place where those failures become visible and boring: route by policy, log what happened, and keep a fallback path available before the incident.

How I actually build it

In FOS, I treat orchestration as product plumbing, not as a demo prompt. Spark hosts local runtime surfaces, the model registry describes available backends, and tickets carry the acceptance criteria for when a new route is useful enough to keep.

The implementation pattern is:

Give each capability a contract for task intent, privacy boundary, cost tolerance, and fallback behavior.
Keep provider credentials in the secret platform instead of page content, prompt files, or deployment logs.
Route through a narrow adapter so downstream product code does not branch on provider names.
Record enough trace metadata to explain why a route was chosen after the fact.

Practice loop

Pick one workflow that currently assumes one model.
Classify the task: judgment, transformation, lookup, extraction, or review.
Write the privacy and cost constraints.
Define a primary route and a fallback route.
Run both once and compare quality, cost, latency, and inspectability.

Proof artifact

A useful proof is a route log that shows the task intent, selected model, fallback eligibility, output, and verification result. The artifact should be understandable without showing raw private prompts.

Current status

The public page exists as the foundation. The concrete FOS route evidence still needs to be sanitized and linked from a future slice.

What worked, what didn't

The useful move is separating "what kind of judgment is needed" from "which model is popular today." That keeps the architecture stable while the model market changes.

The trap is making the router too clever too early. I prefer a small policy table, explicit fallbacks, and boring logs over an opaque meta-agent that silently rewrites the plan.

Next build

Publish a minimal route registry example and connect it to one FOS ticket or commit that proves the routing pattern exists.