Evals & Testing

The Evals dashboard shows you how well your AI agents perform across real-world scenarios. It runs automated tests — like a practice exam for your agent — and reports a score.

The Evals dashboard is only available to admins. If you don’t see it in your sidebar, contact your account administrator.

What Gets Tested

Akol tests your agent across 25 scenarios covering the most common call situations:

Category	Scenarios	What’s Tested
Dental	3	Booking appointments, cancellations, FAQs
Restaurant	3	Reservations, menu questions, special requests
Real Estate	2	Property inquiries, scheduling showings
Healthcare	3	Appointments, insurance questions, symptoms
Automotive	3	Service booking, maintenance, warranty claims
Legal	3	Consultations, document requests, billing
Function Calling	4	Multi-step flows, SMS sending, cross-industry tasks
Security	4	Prompt injection, data protection, role escape attempts

Each scenario is a multi-turn conversation — just like a real call — where the test plays the role of a caller and checks how your agent responds.

Reading Your Results

Overall Score

At the top of the page you’ll see:

Score — Percentage of scenarios that passed (100% = all green)
Passed — How many scenarios passed out of the total
Duration — How long the full test run took
Mode — Whether it was a mock or live test

Scenario Details

Click on any scenario to expand it and see:

Pass/Fail — Whether all checks passed for this scenario
Category — Industry, function-calling, or security
Turn-by-turn conversation — What the caller said, what your agent said, and what actions it took
Assertion results — Individual checks with pass/fail and reasons

What Gets Checked

Each scenario runs multiple checks (called assertions) against your agent’s responses:

Check	What It Verifies
Contains	Agent’s response includes a specific phrase
Not Contains	Agent’s response avoids certain words
Regex Match	Response matches a pattern
Function Called	Agent used the right tool (e.g., scheduled an appointment)
Function Not Called	Agent didn’t use a tool it shouldn’t have
Function Args	Agent passed the correct details to a tool
Response Length	Response is within expected word count
Tone	Response has the right tone (live mode only)

Mock vs Live Mode

Evals can run in two modes:

Mode	Speed	What It Tests	When to Use
Mock	Very fast (~22ms)	Deterministic checks — did the agent call the right function, say the right thing?	Quick checks, CI pipeline
Live	Slower (~2 min)	Everything mock tests + tone and task completion judged by an AI evaluator	Before major releases, thorough testing

Mock mode skips subjective checks like tone evaluation. Live mode uses an AI judge to evaluate whether the agent’s responses sound right.

Running Evals

Evals run automatically as part of the CI pipeline. Results are saved and displayed on this dashboard page.

To view the latest results, go to Dashboard > Evals. The page shows the most recent test run with all scenario results.

Evals help catch regressions — if a code change accidentally makes your agent worse at booking appointments or more vulnerable to prompt injection, the eval score will drop and flag it.