🎯 Eval System · Layer L503

Test, score and regression-check every AI output automatically.

Score AI outputs against test cases — correctness, tone, safety and format. Run regression suites on every deploy. Catch quality drops before users do.

Start building free View docs Pricing

Live £29/mo or £290/yr — save 17%

£29/moStarting price

0–1Composite score

78%Earnings back

CI/CDIntegration

Composite scoringPolicy runtimeCI/CD readyWORM-sealed78% backFree tier

How it works

Eval System in three steps

Production-grade and live on api.forcedream.ai. One Bearer token, zero extra setup.

01 📝
Define

Define test cases — input, expected output, scoring weights. Unlimited cases, reusable across runs.
POST /v1/eval/cases
02 🎯
Score

Each output is scored on correctness, tone, format, safety, latency and cost. A single composite score per run.
POST /v1/eval/run-case
03 🔔
Alert

Regression reports flag quality drops automatically. Fail CI/CD builds when regression exceeds your threshold.
GET /v1/eval/report

What's included

Everything you need, nothing you don't

✓Test case library — define expected outputs per scenario

✓Composite scoring — correctness/tone/format/safety

✓Regression reporting — detect quality drops automatically

✓Policy runtime — enforce output policies at eval time

✓CI/CD integration — fail builds on quality regression

✓WORM-sealed eval results — auditable quality records

Quick start

api.forcedream.ai POST /v1/eval/run-case

FORCEDREAM OS · L503 LIVE

$ curl https://api.forcedream.ai/v1/eval/run-case \
  -H "Authorization: Bearer $KEY" \
  -d '{"case_id":"summarise_v1","output":"Here is the summary...","latency_ms":1200,"cost_pence":4}'

→ {"composite_score":0.91,"pass_match":true,"latency_score":0.88,"cost_score":0.95,"safety_score":1.0}

$ curl https://api.forcedream.ai/v1/eval/report \
  -H "Authorization: Bearer $KEY"

→ {"total_cases":47,"pass_rate":0.96,"regressions":1,"cases":[{"id":"summarise_v1","delta":-0.04}]}

Pricing

Simple, transparent pricing

78% of API earnings flow back to you on every call. No hidden fees. Free tier available.

Starter

£29

/mo · £290/yr

1,000 eval runs/month

Unlimited test cases

Composite scoring

Regression reports

Start free →

How Eval System compares

Purpose-built for AI products. Not retrofitted from general-purpose tools.

Feature	ForceDream Eval	Braintrust	PromptFoo	LangSmith
AI-native scoring	✓	✓	Partial	✓
80% earnings back	✓	—	—	—
Policy enforcement	✓	—	—	—
CI/CD integration	✓	✓	✓	Partial
WORM audit	✓	—	—	—
Price	£29/mo	£150+/mo	Free	£100+/mo

FAQ

Frequently asked questions

What is a composite score?

Weighted average of correctness (does output match expected?), tone (is it appropriate?), format (does it follow schema?), safety (does it pass moderation?), latency and cost. Each 0–1, combined into one score.

How do I define test cases?

POST to /v1/eval/cases with an input, expected output and scoring weights. Cases are stored in your account and reusable across all future eval runs.

Can I run evals in CI/CD?

Yes. The eval API returns a non-zero exit code when regressions exceed your threshold. Integrates with GitHub Actions, CircleCI or any CI system.

What is the Policy Runtime?

Enforces output policies at evaluation time — e.g. "all outputs must include a disclaimer" or "outputs must not exceed 500 words". Violations fail the eval.

Are eval results auditable?

Yes. Every eval run is WORM-sealed — the inputs, outputs, scores and policy verdicts are all immutably recorded.