Conversational AI Metrics Game: How Are Chatbots Measured

Implementing a chatbot, or any other type of conversational AI, to support your business is one thing. But, how do you measure a chatbot’s performance and understand how effective they are?

Metrics and KPIs are a hotly debated topic in the conversational AI space. As with all initiatives, figuring out how chatbot or, more broadly, conversational AI automation is helping a business is fundamental. 

At the same time, the fact that the interaction between your customers and your bot are conversational can make it very difficult to figure out how to mark success versus failure

Furthermore, chatbots and voicebots aren’t usually deployed in a vacuum. They’re an additional channel within a well-orchestrated customer experience, which means they need to act well on their own, as well as in concert with the other channels.

These and other circumstances specific to the management of conversational AI experiences make the measurement of their performance so complex. In this article, we clear the clouds on the topic of chatbot and conversational AI metrics, discussing:

  • The ground rules, i.e. setting the scene on what are common concerns and solutions when it comes to CAI measurement
  • The Metrics Game, i.e. a framework for thinking about chatbots metrics and KPIs
  • Experts’ takes on how to choose and analyze conversational AI metrics
  • Experts’ takes on using NPS, CSAT, or similar surveys in a conversational AI context.

Some ground rules

Before jumping in in our Metrics Game framework, there are some ground rules to be laid. These will give some context on the common concerns and solutions that CAI professionals face, when it comes to metrics:

  • Without metrics and KPIs, it can be hard to know whether what you’re doing is working or not, especially as the volumes of the interactions increase, or the number of users is particularly high.
  • Keep track of different metrics to get the whole picture and be able to pinpoint exactly at what point things are going wrong
  • …but know which ones measure your success and goals and, thus, are your priority. It can be confusing and even unproductive to try to cover all bases. Focusing on a couple of key metrics that tell you how well you’re doing on your goals is much more helpful.
  • Understand what different kinds of metrics can offer you. Some KPIs are great indicators of the bot’s performance, whereas survey results speak more clearly about customer experience. Similarly, you’ll be able to distinguish between metrics that track the single performance of the bot from those accounting for all channels unitedly.
  • Be skeptical of surveys. They’re necessary to report to other units and they’ll probably give you some insight on hitches customers are facing on their journeys. However, there are so many variables that can make your NPS/CSAT be skewed, like when the question is being asked, the low percentage of responses, and the difficulty in distinguishing whether the response refers to the content of the answer or its factualness.
  • At the same time, you don’t want to be snobby about any metric. Yes, NPS and CSAT aren’t perfect, but they can make the alarm bell sound and notify you that some journeys need more close attention and analysis.

The Metrics Game: Single Player vs Team Performance

Yes, we’re Italians at heart. Yes, we love a soccer (or football ⚽) metaphor. Whether you’re a fan or not, bear with us: if Ted Lasso was able to manage West Ham, you’ll definitely be able to follow this, too.

Soccer managers and conversational AI people have, at least, one thing in common: a (healthy) obsession with performance. All the scouting for the best players, all the training (physical, mental or NLU-related), all the technical team coordination is all for one thing: the live games.

The stress here is on the plural -games-, as it’s never about the single play. The first game of the season sure is important, but it’s known, in the sport like in CAI, that it takes some time to adjust, learn from the mistakes, and train again to be prepared, should a similar situation occur again.

How do teams do this? Keeping track of team dynamics, as well as the single player's performance.

This two-pronged approach is one that is very beneficial in conversational AI contexts, too, where you want to understand how your star player, i.e. the bot, is performing, as much as the whole team dynamics and results, i.e. the broader customer journey and all channels flowing into customer experience. 

In keeping with the soccer metaphor, from the whole roster of possible metrics there are out there, you’ll want to select a line up that tackles the two fronts:

  • Bot-specific metrics, to show you how the single player is performing;
  • Customer experience metrics, to reveal how the audience is perceiving the overall interaction with the company.

From the outside, the second set of metrics will naturally be more important, but the first set of metrics will be more useful in deriving actionable insights on areas of improvement for the CAI team.

Experts’ takes: Choosing & Analyzing Conversational AI Metrics

As with all things in conversational AI, choosing the right line up of metrics is all about your own situation and, most importantly, goals

Are you looking to free up customer service? 

Then, a metric like first chat resolution will be likely relevant.

Is the bot supposed to enhance the current customer experience? 

You might need to spend more time in the transcripts or think of good ways for collecting feedback.

From talking to various conversational AI experts, we’ve seen how difficult it can be to find the right metrics and how important it is to dig deep behind the numbers. In this section, we share some of the most interesting takes professionals have shared with us.

Read the ingredients’s list

A Conversational AI Product Owner working in the Banking sector: “In terms of the metrics that are measured in a platform out-of-the-box, I’d recommend paying attention to what definition that specific vendor uses for that KPI. Something I noticed in the market is that everyone assumes they understand what common metrics, like containment, dropouts, answer given, mean. Often, though, different companies will have different definitions of them and not being aligned with your platform on this can skew your analyses.”

Keep your eyes on the endpoints

The same CAI PO: “It really depends on the purpose of your application. If you build a conversational IVR, containment doesn't matter. If you build a voicebot to help out your contact center agents, then containment is a very important metric. So, it’s really about understanding what’s the purpose of what you’re building and defining tailored KPIs that can measure what you care about.

In general, one thing that’s really important is to define your set of KPIs on the endpoints of your application, for you to understand where the customer journey ended in your application.”

Combining metrics with other approaches

Amir Kiabi, Digital Transformation & Improvement Manager: “The most important metrics would be:

  • Good match rate (or Success rate)
  • Automated chats
  • CSAT (Customer satisfaction). 

We also looked at whether the customer accepted the answer, especially when the bot provided an answer that is correct, but that they might not want to hear. Constantly monitoring the tone of voice of the answers is also crucial, but hard to use as a metric.”

Gotta pick your metrics battles

Esha Metiary, Senior Conversation Designer: “Companies usually aim for a higher NPS and low handover rate (or call to customer support), but, from my experience, the two do not really go together.

There's only so much a chatbot can do and so much information it can give, without it being incomprehensible, which means there are a lot of cases that are either too complicated or sensitive to explain for the chatbot

If someone says the order they placed was for a person who went missing, the chatbot should do an immediate handover to a human, who can better handle this touchy situation. The customer might end up being very satisfied with how they are helped and give a high NPS score, but their call contributed to increasing the handover rate.

Personally, I think NPS is the best metric to pursue between the two, but, either way, I’d advise to shoot for either of them, rather than both at the same time. Of course, there are also internal metrics, like containment, accuracy, recall, but I like to focus on the external metrics, because those matter the most for the customer as well.”

Testing vs. post-launch metrics

Mahmoud Maharan, Conversational AI expert: “The tell sign of testing success, for me, is mainly to see that the journey is ready and complete. To verify the success of the chatbot, there are many popular metrics, like containment rate, first chat resolution (or handover rate), and various NLU performance criteria, like confidence, correctness, clarity. I also keep track of experience, sending a survey question through the chat that goes: ‘How likely are you to recommend XYZ company’s products and services to your family and friends?’ ”

No matter how big you are, never forget about transcripts

One thing some might be surprised to hear is how common it is for conversational AI teams in big organizations with really high volumes to take time to read through and analyze the transcripts regularly

Metrics are a great way to get information at a glance, but the real data is in the transcripts. That’s where you can see how the customers are talking with your bot, what issues they’re stumbling through, and pick up on the subtleties that numbers don’t always pick up on.

Experts' takes: NPS/CSAT Surveys in Conversational AI?

One of the most hotly debated topics when it comes to metrics and ways to measure performance in conversational AI products is NPS, CSAT and, generally, customer experience surveys.

On the one hand, chatbots and voicebots are expected to improve customer experience and customer satisfaction surveys are meant to measure exactly that.

On the other hand, most experts, even those who believe in the usefulness of NPS or CSAT, agree that these are not great at measuring the effectiveness of the bot itself.

We turn, once again, to the opinions of our experts, to help you get a sense of how to approach this unsettled issue in conversational AI data and performance tracking.

NPS as a way to bring awareness to conversational AI

Amir Kiabi, Digital Transformation & Improvement Manager: “We did use NPS as a metric, but not specifically for the CAI. However, I believe that when implementing a new solution like CAI, NPS is heavily recommended, not as a way to measure the CAI itself, but rather to make users aware about the existence of this new channel of communication. For example, you could add a question in the survey saying: “We have recently upgraded our digital communication channels. Have you tried them? If yes, please rate your experience."

Understand what’s for reporting and what’s actually measuring performance

The CAI Product Owner working in the Banking sector: “On the one hand, NPS is really often the only customer feedback metric that sticks organization-wide, so using it enables you to cater to your stakeholders. On the other hand, it’s not really suitable from a practical perspective to assess how successful your bot is. 

So, I’d consider it as more of a reporting metric, than a performance one. If you use it, make sure you also bring this up, indicating what would be better alternative metrics to look at to assess performance (e.g., containment, hand-over rate, dropout).”

A tricky yet still widely used metric

Dr. Lilian Balatsou, AI Evangelist: “NPS is a little bit of a tricky measure, because it doesn't measure the conversational agent and it usually measures the overall experience. Still, companies use it and they make decisions on it. Ultimately, if you deploy an AI and NPS is dropped, that’s something you need to fix. That’s why this metric is still important when we talk about customer-facing AI.”

It’s all in the timing

Esha Metiary, Senior Conversation Designer: “I don't particularly like NPS, but I don't like CSAT, or all the other scores, either. That is because the NPS or the CSAT question usually almost immediately follows after the interaction of the customer. At that moment, the emotions are high and the customer’s view of the chatbot can be skewed.

From my experience, it's better to ask the NPS (or CSAT) question after a couple of days, when the feeling has subsided and they have had some time to think about it. 

Still, a lot of times the feedback can be negative for reasons outside of the bot, e.g.,  because the customer didn't get the answer that they actually wanted. I might not like to hear that my package is delayed, but that is just how it is and the chatbot cannot help it. 

Overall, it's not a completely fair metric, but it is recommendable, as a company, to strive towards better NPS. That is how you know that you've actually helped the customer.”

The secret score that no one is really applying… yet

Mahmoud Mahran, Conversational AI expert: “If you ask me, from my own perspective, NPS doesn’t do it. The best metrics you could use to measure experience, in my opinion, would be customer effort score, which asks how the interaction with the assistant was from very easy to very difficult. Unfortunately, this isn’t always applied in CAI verticals, nor in customer support, in general. “

In conclusion

As this roundup of expert takes has proven, there is no unified agreement on what are the best metrics and strategies to monitor bot performance. After all, each organization is unique and different from the others, making all attempts at ideating a one-size-fits-all approach futile. 

Finding one’s own approach and objectives in conversational AI and establishing metrics that can keep track of custom goals is, thus, essential. Exploring what experts have found in their journeys, as we did in this article, can hopefully help you in this search. 

From our experience and what our experts have shared, there is one more general lesson to be learned: It doesn’t make sense to only focus on Bot-specific metrics, nor does it help to only look at Customer experience. A combination of the two is needed to be able to tell the whole story and to report to different stakeholders.

Inspired to explore what conversational AI can do for your company? We can help.

Feel free to spill out all your conversational needs and ideas via voice message or good old email.

And if you’re still hungry for knowledge, follow us on LinkedIn for weekly updates on the world of conversational AI, or check out our full break-down of the Bot Building Process.

More case studies & conversational AI guides

Leggi di più

Iscriviti alla nostra newsletter

Ricevi aggiornamenti importanti dal nostro team esperto

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Iscrivendoti accetti la nostra Privacy Policy

Voice technology and Conversational AI: not so easy to digest.

Get our free Voice Bites 1/month

Waaaaaay easier to digest.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
By signing up you agree to our Privacy Policy

(Wanna check Voice Bites first?)