HighLevel Voice AI Prompt Optimizer: Stop Guessing. Start Knowing.

April 27, 2026•6 min read

If you've ever deployed a Voice AI Agent and then sat there wondering: will it actually handle that edge case? What if a caller goes off-script? What if it books the wrong appointment? You already know the problem.

Testing voice agents today looks like this - open the agent, manually call it, take notes, tweak the prompt, call again, repeat. It's slow, inconsistent and the moment you think it's ready, a real caller finds the one scenario you didn't think to test.

That ends now.

Prompt Optimizer is coming to HighLevel's Voice AI, and it's the most complete test-and-tune engine we've ever built for agents.

What is a Prompt Optimizer?:

Prompt Optimizer is a purpose-built testing and optimization engine for Voice AI Agents where you can configure your test plan, run real calls against your agent, score every result with AI reasoning and then rewrite what failed, automatically, without ever touching your live agent.

It's not a simulator. It's not a linter. It's a closed-loop system that takes your agent from "I think this is working" to "I know this is working."

As Kaustubh Kabra, Senior Manager, AI at HighLevel puts it, "Building a reliable voice agent has always been a painful loop. Agencies write a prompt, run a handful of manual test calls, guess what went wrong, tweak the prompt and repeat. It's slow, subjective and hard to tell if a change actually improved the agent or just moved the problems around. Prompt Optimizer solves this head-on. Instead of a handful of guesses, the system runs structured scenarios, scores every outcome objectively and generates targeted prompt improvements, turning days of trial-and-error into a scored, AI-driven feedback loop that runs in minutes."

The numbers say it all:

Prompt Optimizer has been live in early access, and the results speak for themselves.

In just a week, so far, 124 voice agents have already been optimized across 110 locations. The system is generating roughly 830 minutes of test calls every single day, over 13 hours of structured voice conversations scoring agents around the clock. In just 8 days, users have run 586 optimization cycles. The average agent improved accuracy by 20.5 percentage points over its pre-optimization baseline, with more than 75% of agents crossing the 80% accuracy threshold after optimization.

That's not incremental improvement. That's a step change.

What you can build with it:

Prompt Optimizer has been live in early access, and the results speak for themselves.

In just a week, so far, 124 voice agents have already been optimized across 110 locations. The system is generating roughly 830 minutes of test calls every single day, over 13 hours of structured voice conversations scoring agents around the clock. In just 8 days, users have run 586 optimization iterations, with approximately 83% maintained accuracy.

That's not incremental improvement. That's a step change.

The flow: Configure, Test, Improvise:

1. Configure: Set up your test plan

You tell Prompt Optimizer what you want to test, and AI does the heavy lifting from there. It auto-generates test scenarios based on your agent's actual prompt, language and configured actions, including caller personas, what they say, what the agent should do and how critical each scenario is.

Don't want to start from scratch every time? Load scenarios from a previous run. Testing in Spanish or French? Scenarios, calls, and evaluations all run in your chosen language. Want to stress-test consistency? Run up to 10 calls per scenario.

Before anything fires, a pre-run checklist walks you through your test contact, calendar, action warnings and daily usage, so nothing unexpected happens on the other side.

2. Test: Real calls. Real actions. Real results.

These aren't simulated conversations. Your agent picks up, talks and executes everything such as knowledge base lookups, appointment bookings, SMS, emails, CRM updates. Everything fires end-to-end. You get full call transcripts with speaker labels, audio playback, tool invocation tracking and AI scoring that explains exactly why each call passed or failed.

Watch every call move from Waiting to In Progress to Evaluating to Completed in real time. Step away, come back, results are waiting for you.

3. Improvise: AI rewrites what failed

This is the part that changes everything.

Click Improvise and the AI reads every failed transcript, identifies root causes, and generates a new, targeted prompt variation. Not a blanket rewrite, but a surgical fix. It then runs that variation through your test suite and shows you the results side-by-side with the original.

Want to go further? Click Auto Optimize and the engine runs up to 5 back-to-back variations automatically, stopping when it hits your target accuracy, exhausts variations or detects no further improvement.

A built-in viewer shows you exactly what changed, what was removed and what was added. Every variation is stored with its accuracy score, call results, AI reasoning and prompt. The best-performing one gets tagged and you choose when to apply it to your live agent.

Your production agent is never modified during any of this. A clone runs all tests. Nothing goes live until you review the diff and click Use.

What makes this different?:

Everything runs natively inside HighLevel: the tester voice bot, the optimization engine, the evaluation layer. It's a ground-up, proprietary algorithm purpose-built for voice agent optimization inside the GHL ecosystem. No third-party testing tools. No leaving your dashboard. No duct-taping separate systems together.

And because it tests the full agent, not just the conversation, you're validating the things that actually matter in production by evaluating questions such as - does the appointment get booked correctly? Does the right SMS send? Does the knowledge base return the right answer?

A few things to know before you start:

Voice AI is non-deterministic by nature, so results will vary across runs and that's expected. Treat accuracy scores as directional signals, not hard thresholds. The scoring engine in V1 leans strict and we're actively tuning it with real-world feedback.

Real actions fire during testing, including appointments, emails, and SMS, so use a test contact and a draft calendar to keep things sandboxed. Each call has a 5-minute timeout, so design your scenarios accordingly.

Every location gets 20 free minutes of testing time daily, enough to start optimizing right away.

This is what confident deployment looks like:

The gap between building an agent and trusting it has always been the hard part. Prompt Optimizer closes it, not by asking you to test more manually, but by making testing something that happens for you, systematically, with clear output at the end.

Aarat Bhatnagar, Product Manager at HighLevel, frames what this shift really means. "I've seen firsthand how much agent development has historically relied on intuition over certainty. Teams ship agents hoping they work, but without a reliable way to measure, validate and improve behavior at scale. Prompt Optimizer changes that paradigm. It brings rigor, repeatability and true feedback loops into Voice AI development. What excites me most is not just faster iteration, but the shift from prompt engineering to agent engineering, where teams can systematically improve outcomes, not just tweak inputs. This is a foundational step toward making Voice AI agents production-grade by default, not by trial and error."

It's coming soon. And when it lands, the right move is simple: open it, run your first test, and find out exactly what your agent is doing when you're not watching.

HighLevel Voice AI Product Feature