🎯 Eval System · Layer L503

Test, score and regression-check every AI output automatically.

Score AI outputs against test cases — correctness, tone, safety and format. Run regression suites on every deploy. Catch quality drops before users do.

Live £29/moor £290/yr — save 17%
£29/moStarting price
0–1Composite score
78%Earnings back
CI/CDIntegration
Composite scoringPolicy runtimeCI/CD readyWORM-sealed78% backFree tier

How it works

Eval System in three steps

Production-grade and live on api.forcedream.ai. One Bearer token, zero extra setup.

  1. 01 📝

    Define

    Define test cases — input, expected output, scoring weights. Unlimited cases, reusable across runs.

    POST /v1/eval/cases
  2. 02 🎯

    Score

    Each output is scored on correctness, tone, format, safety, latency and cost. A single composite score per run.

    POST /v1/eval/run-case
  3. 03 🔔

    Alert

    Regression reports flag quality drops automatically. Fail CI/CD builds when regression exceeds your threshold.

    GET /v1/eval/report

What's included

Everything you need, nothing you don't

Test case library — define expected outputs per scenario
Composite scoring — correctness/tone/format/safety
Regression reporting — detect quality drops automatically
Policy runtime — enforce output policies at eval time
CI/CD integration — fail builds on quality regression
WORM-sealed eval results — auditable quality records

Quick start

api.forcedream.ai POST /v1/eval/run-case
FORCEDREAM OS · L503 LIVE
$ curl https://api.forcedream.ai/v1/eval/run-case \
  -H "Authorization: Bearer $KEY" \
  -d '{"case_id":"summarise_v1","output":"Here is the summary...","latency_ms":1200,"cost_pence":4}'

→ {"composite_score":0.91,"pass_match":true,"latency_score":0.88,"cost_score":0.95,"safety_score":1.0}

$ curl https://api.forcedream.ai/v1/eval/report \
  -H "Authorization: Bearer $KEY"

→ {"total_cases":47,"pass_rate":0.96,"regressions":1,"cases":[{"id":"summarise_v1","delta":-0.04}]}

Pricing

Simple, transparent pricing

78% of API earnings flow back to you on every call. No hidden fees. Free tier available.

Starter
£29
/mo · £290/yr
1,000 eval runs/month
Unlimited test cases
Composite scoring
Regression reports
Start free →
Most popular
Pro
£89
/mo · £743/yr
10,000 runs/month
Policy runtime
CI/CD integration
Regression alerts
Historical trends
WORM audit
Start free →
Scale
Custom
Unlimited runs
Dedicated cluster
Custom scoring models
SLA
Start free →

Comparison

How Eval System compares

Purpose-built for AI products. Not retrofitted from general-purpose tools.

FeatureForceDream EvalBraintrustPromptFooLangSmith
AI-native scoringPartial
80% earnings back
Policy enforcement
CI/CD integrationPartial
WORM audit
Price£29/mo£150+/moFree£100+/mo

FAQ

Frequently asked questions

Weighted average of correctness (does output match expected?), tone (is it appropriate?), format (does it follow schema?), safety (does it pass moderation?), latency and cost. Each 0–1, combined into one score.
POST to /v1/eval/cases with an input, expected output and scoring weights. Cases are stored in your account and reusable across all future eval runs.
Yes. The eval API returns a non-zero exit code when regressions exceed your threshold. Integrates with GitHub Actions, CircleCI or any CI system.
Enforces output policies at evaluation time — e.g. "all outputs must include a disclaimer" or "outputs must not exceed 500 words". Violations fail the eval.
Yes. Every eval run is WORM-sealed — the inputs, outputs, scores and policy verdicts are all immutably recorded.

Start with Eval System.
Scale to all 22 products.

Free tier available. 80% earnings from your first call. Every call. WORM-sealed by default.

No credit card    80% earnings guaranteed    WORM-sealed audit