Skip to Content
DashboardEvals & Testing

Evals & Testing

The Evals dashboard shows you how well your AI agents perform across real-world scenarios. It runs automated tests — like a practice exam for your agent — and reports a score.

The Evals dashboard is only available to admins. If you don’t see it in your sidebar, contact your account administrator.

What Gets Tested

Akol tests your agent across 25 scenarios covering the most common call situations:

CategoryScenariosWhat’s Tested
Dental3Booking appointments, cancellations, FAQs
Restaurant3Reservations, menu questions, special requests
Real Estate2Property inquiries, scheduling showings
Healthcare3Appointments, insurance questions, symptoms
Automotive3Service booking, maintenance, warranty claims
Legal3Consultations, document requests, billing
Function Calling4Multi-step flows, SMS sending, cross-industry tasks
Security4Prompt injection, data protection, role escape attempts

Each scenario is a multi-turn conversation — just like a real call — where the test plays the role of a caller and checks how your agent responds.

Reading Your Results

Overall Score

At the top of the page you’ll see:

  • Score — Percentage of scenarios that passed (100% = all green)
  • Passed — How many scenarios passed out of the total
  • Duration — How long the full test run took
  • Mode — Whether it was a mock or live test

Scenario Details

Click on any scenario to expand it and see:

  • Pass/Fail — Whether all checks passed for this scenario
  • Category — Industry, function-calling, or security
  • Turn-by-turn conversation — What the caller said, what your agent said, and what actions it took
  • Assertion results — Individual checks with pass/fail and reasons

What Gets Checked

Each scenario runs multiple checks (called assertions) against your agent’s responses:

CheckWhat It Verifies
ContainsAgent’s response includes a specific phrase
Not ContainsAgent’s response avoids certain words
Regex MatchResponse matches a pattern
Function CalledAgent used the right tool (e.g., scheduled an appointment)
Function Not CalledAgent didn’t use a tool it shouldn’t have
Function ArgsAgent passed the correct details to a tool
Response LengthResponse is within expected word count
ToneResponse has the right tone (live mode only)

Mock vs Live Mode

Evals can run in two modes:

ModeSpeedWhat It TestsWhen to Use
MockVery fast (~22ms)Deterministic checks — did the agent call the right function, say the right thing?Quick checks, CI pipeline
LiveSlower (~2 min)Everything mock tests + tone and task completion judged by an AI evaluatorBefore major releases, thorough testing

Mock mode skips subjective checks like tone evaluation. Live mode uses an AI judge to evaluate whether the agent’s responses sound right.

Running Evals

Evals run automatically as part of the CI pipeline. Results are saved and displayed on this dashboard page.

To view the latest results, go to Dashboard > Evals. The page shows the most recent test run with all scenario results.

Evals help catch regressions — if a code change accidentally makes your agent worse at booking appointments or more vulnerable to prompt injection, the eval score will drop and flag it.

Last updated on