As highlighted by the many experts we interviewed for our report Leading with Conversation Design, tests are a pivotal part of delivering a successful conversational experience. So much so that, unlike other phases of the Conversation Design process, testing is carried out throughout all the stages, from beginning to end.

In this article, we dive deeper into one, if not THE, most important parts of the bot building journey: Testing.

We’ll be exploring:

The 3 Levels of Bot Testing Conversation Design teams usually escalate through
How to decide who makes for a good tester candidate & how many testers you need
Tips and strategies for effective recruiting
How to test: prototyping methods, testing strategies, and what does the research recommend
Bonus: Usability testing show-down: SUS vs. CUQ

The 3 Levels of Conversation Design Bot Testing

As discussed in detail in the Process part of the Leading with Conversation Design report, there are different ways testing can be carried out within the Conversation Design process. Usually, the basin of testers will expand through these 3 levels:

First, the internal team will play with the prototype and tweak the design. While these testers have the obvious disadvantage of not accurately reflecting the end user’s specific needs and behaviors, it is crucial, especially in the initial stages, to put your design “hypotheses” to the test. For this, you need fast and iterative feedback loops to occur.
Once the design seems solid enough, it’s time to hand it to someone who isn’t aware of all the details related to the project at hand. This is where cross-team testing occurs.
Finally, before you go into production, it’s a good idea to get a better understanding of how your end user (or, someone like your end user) interacts with the bot. External testing will enable you to make the final, crucial tricks that ultimately can make your bot truly successful at catering to their needs.

Who & How Many Testers Does a Bot Need

As with many questions in the Conversation Design world, you should really define on a project-by-project basis the ideal number of testers you want to rely on and who those testers should be.

There are multiple factors that come into play in this decision, but broadly speaking we can identify 2 main considerations to base this decision on:

The estimated potential impact:

- If you expect a lot of people to interact with your bot, it makes sense to invest a little extra time and effort into testing. Particularly, you’ll be carrying out more extensive external tests, some of which will likely involve your actual customers, others will involve actively recruiting participants.

- With higher user volumes, you might set up periodical testing sessions to routinely check if things have changed and how your designs need to be updated.

The type of conversational interfaces you’re creating. In simple terms, you want to consider whether the bot is (part of):

- A new product, or experience, that has never been available before. Recruiting external testers will be necessary, in this case, as you have no available indication on what your audience’s needs and behaviors are like.

- A broader, well-established customer experience. Say you’re launching a web chatbot to help users navigate your high-traffic website. In this scenario, a subset of your website users can easily become your chatbot testers, without them even realizing that the chatbot they’re interacting with is in testing.

Tips & Strategies to Recruit Testers

Not all Conversation AI projects require formal recruiting per se. For your external test rounds, you might, for example, rely on your existing customers, simply making the option available to a subset of them and tracking their behavior.

When you decide to recruit participants and conduct a more formal research, there are some cautions to take. First of all, successful recruiting starts with thorough preparation. Be sure to define your criteria for selecting participants ahead of time, decide how you will contact them, design and create your materials.

Once all that is set, you can start screening potential candidates. If you rely on a specialist recruitment agency, on the other hand, they will take care of the screening, after you’ve provided them with a detailed recruitment scanner.

All throughout your recruitment process, you’ll want to make sure that you keep track of the potential candidates that you speak to and that you’re very proactive in communicating with them. Don’t forget that participants should also fill out a consent form and, whether they end up being a part of your research or not: make sure you thank them and are appreciative of the time they offered you.

How to Test Conversational Interfaces: Prototyping methods, Tips & Usability Frameworks

The most important thing you need to be able to start testing is, of course, a prototype. Like we discussed in more detail in Road to Conversation Design, there are lots of different methods to bring a prototype to life. You can go as low- or high-tech as you want with this. Examples of prototyping methods we discussed include:

Table reads
Wizard of Oz testing
Interactive prototypes that you can easily put together with tools like Voiceflow

Still, the set up for testing a conversational interface doesn’t end at the prototype. Before you put the bot in front of someone, you’ll want to:

Define what tasks you’ll be asking your testers to perform, including both:

- Tasks with very clear and specific instructions of what you want them to do

- “Blind” tasks, where the tester is asked to perform an action, without specific instructions

Decide the order the tasks will be presented to the testers and whether you want to vary it across testers
Think about what are the different metrics you want to measure and behaviors you want to monitor, making sure that your own conduct as researcher doesn’t bias your results (see Observer Expectancy Effect)

In terms of metrics and tools for chatbot and voicebot evaluation, there are different elements and approaches you can adopt. As with all kinds of tests, there are 2 kinds of metrics you can adopt and combine:

Quantitative metrics. For example, in this research studying a voice/hybrid interface, number of task failures, waiting times, time spent on speech/writing were monitored.
Qualitative metrics, which can refer to the outcome of interviews and/or surveys, but also to the observation of the user’s body language and behaviors.

Bonus: Usability testing show-down: SUS vs. CUQ

Different tools have been used and discussed by the research on usability tests for conversational interfaces. Some, like SUS (System Usability Scale) and UEQ (User Experience Questionnaire), are borrowed from UX research, others have been developed to better capture the specific kind of human-computer interaction enabled by chatbots, as in the case of the CUQ (Chatbot Usability Questionnaire) and BUS-15 (Bot Usability Scale) scales, and voice user interfaces, as seen in the VORI (VOice useR Interface Interactability) framework.

SUS and CUQ are two of the most popular tools discussed in the literature on usability testing for conversational interfaces.

The first, the System Usability Scale (SUS), developed in 1986 by John Brooke, is often seen more as a “quick and dirty” method for assessing conventional computer systems and, particularly GUIs.

The System Usability Scale (SUS), via ResearchGate

‍

While SUS can (and is) used to assess chatbot and VUIs, many researchers in this domain argue against it. Instead of SUS, Ulster University has proposed the so-called Chatbot Usability Questionnaire.

CUQ assesses aspects that are more relevant to evaluate conversation-driven systems, as it focuses on the categories of personality, onboarding, navigation, understanding, responses, error handling and intelligence.

The Chatbot Usability Questionnaire (CUQ), via Ulster University

‍

Inspired to explore what conversational AI can do for your company? We can help.

Feel free to spill out all your conversational needs and ideas via voice message or good old email.

‍

And if you’re still hungry for knowledge, follow us on LinkedIn for weekly updates on the world of conversational AI, or check out our article where we discuss whether end-to-end, cross-functional teams à la Miro or Spotify could be on the horizon for conversational AI.