Routing Explorer
Inference arbitrage.
Pareto-optimal.
The Inference Arbitrage Router maintains a provider state matrix refreshed every 200ms and solves a lightweight Pareto optimisation at dispatch time. Four priority modes map requests to distinct regions on the Pareto frontier.
200msState refresh
4Priority modes
0.94CBEI production
34msFailover latency
Priority modes
Four Pareto modes
balanced
Social optimum
Equal weighting across all signals. Produces Pareto-optimal allocation closest to social optimum.
cost: 0.25
latency: 0.25
quality: 0.25
availability: 0.25
cheapest
Cost minimiser
Minimises cost subject to quality floor of mean minus 2 standard deviations.
cost: 0.80
latency: 0.067
quality: 0.067
availability: 0.067
fastest
Latency minimiser
Minimises round-trip time subject to cost ceiling of mean plus 1 standard deviation.
cost: 0.067
latency: 0.80
quality: 0.067
availability: 0.067
quality
Quality maximiser
Maximises output quality subject to cost ceiling of mean plus 2 standard deviations.
cost: 0.067
latency: 0.067
quality: 0.80
availability: 0.067
Provider matrix
State matrix structure
P matrix — n providers × 4 signals
| Provider | Cost/token | Latency EWMA | Quality score | Availability |
|---|---|---|---|---|
| Provider [A] | Tier A cost | EWMA α=0.3 | WORM-derived | Binary 200ms |
| Provider [B] | Tier B cost | EWMA α=0.3 | WORM-derived | Binary 200ms |
| Provider [C] | Tier C cost | EWMA α=0.3 | WORM-derived | Binary 200ms |
| Provider [N] | Variable | EWMA α=0.3 | WORM-derived | Binary 200ms |