πŸ₯‹Sensei

The open-source qualification engine for AI agents

Test. Evaluate. Certify.
Before you hire an agent, ask the Sensei.

Three-Layer Evaluation

Every agent is tested across three dimensions. No shortcuts.

Task Execution

Task Execution

Can the agent do the job?

Measure real performance against domain-specific KPIs. Each task is scored on concrete, quantifiable metrics β€” not vibes.

Reasoning

Reasoning

Can it explain its decisions?

Probe the agent’s thought process. Great execution means nothing if the agent can’t articulate why it made a choice.

Self-Improvement

Self-Improvement

Can it learn from feedback?

Give the agent feedback and watch it adapt. The best agents don’t just perform β€” they evolve.

See It In Action

Watch Sensei evaluate an agent in real-time. Pick a suite and see how the three layers unfold.

sensei β€” evaluation
β–‹

How It Works

A simple, structured pipeline from agent to verdict.

πŸ€–Your AgentHTTP / Stdio
πŸ₯‹SenseiEngine
🎯 Task
🧠 Reason
πŸ“ˆ Growth
πŸ“ŠScore0–100
πŸ…BadgeDecision

Built-In Test Suites

Battle-tested evaluation suites for the most common agent roles. Create your own in minutes.

πŸ“ž12 scenarios

SDR

Cold outreach, email personalization, call analysis, and pipeline qualification

🎧15 scenarios

Support

Ticket resolution, multi-turn conversations, escalation handling, CSAT optimization

✍️10 scenarios

Content Writer

Blog posts, social copy, SEO optimization, brand voice consistency

πŸ§ͺ14 scenarios

QA Engineer

Test case generation, bug reporting, regression analysis, coverage assessment

πŸ“Š11 scenarios

Data Analyst

SQL generation, insight extraction, visualization recommendations, anomaly detection

πŸ’»16 scenarios

Developer

Code generation, refactoring, PR review, documentation, debugging

Three Lines to Qualify

Load a suite, create an adapter, run. That's it.

qualify.ts
import { SuiteLoader, Runner, createAdapter } from '@sensei/engine';
const suite = await loader.loadFile('./suites/sdr-qualification/suite.yaml');
const adapter = createAdapter(suite.agent);
const result = await new Runner(adapter).run(suite);
// result.scores.overall β†’ 87.3
// result.badge β†’ "silver"