From Testing to Production: Releasing Exemplary Customer Service Bots

For many CAI teams, the road to releasing flawless customer service bots is fraught with uncertainties – from testing methodologies to ensuring a seamless transition to production. The question of what to keep in check before unveiling your creations to the world can be a daunting one.

In this interview, we sit down with Mahmoud Mahran, a seasoned expert in the realm of conversational AI, as he shares his invaluable insights into crafting exemplary customer service bots that not only meet expectations but redefine them.

Join us as we explore the strategies, challenges, and considerations that play a pivotal role in unleashing top-tier conversational experiences onto the digital stage.

Tell me about your conversational AI testing framework

To ensure that I'm releasing the journey that customers are expecting, I look at testing as a source of truth on how successful and effective the designs are. That’s why I always make sure that testers are invited in the very early stage of discussion.

Right after the new use case has been identified and I’ve given some details about it to the developers, I’ll set up a meeting with everyone, including the testers. In this meeting, I’ll clarify for everyone what the success criteria are to determine customer’s satisfaction, address any emergent question, and agree on a timeline.

First, the developers will finalize the implementation and, then, testers will interact with the journey. They’ll report on any comments they have, or bugs they discover, and send it back either to the developers or to the designer, depending on what kind of feedback they have.

Once everything is fixed, the journey will be released below the line, meaning customers aren’t directly invited to use the chatbot for this purpose and only in some situations they’re offered to utilize the new technical support function. For example, users already interacting with the chat, or users who contacted the technical support call center, might be invited to try solving their issue through the chatbot. This way, it’ll be possible to see if everything goes well in production with a small set of users, without having to cope right away with the massive volumes that come with pushing it above the line.

At this point, the conversation designer themselves will test on production and either give the go-ahead for releasing above the line, i.e., where the chatbot is actively promoted for solving the new use case that’s covered, or ask to pull back and fix anything they’ve noticed in production. This means all our customers will be encouraged to access the chat for this new purpose.

There might be instances where I realize, from a business perspective, the issues are insignificant, so I decide to go live and to fix the issues later on. Ultimately, the major goal is to make customers happy and that’s where my priorities lie.

How soon are testers invited to start testing?

Before testers are involved, the conversation designer and I will have had to work on the flow, make sure all utterances are added to the NLU, all the features are testable, and all the integrations are ready.

How do testers know whether the issues they experience should be addressed by the designer or by a developer?

The testers should know and have experience with conversational AI. This means that they’re able to see right away whether their comment is of a more functional nature and should, therefore, be delivered to a developer, or, if it’s more of a matter of scripts, they’ll bring it to the conversation designer’s awareness.

How many testers are involved for each new use case?

It depends on the journey we’re talking about. If it is a matter of adding information to an existing journey, or adding a simple functionality, one tester will suffice. This might be the case when you’re releasing a new offer, without too many functions.

If it’s a huge journey, or you’re creating a whole new chatbot, you’ll need at least two testers. Let’s say you’re a B2C company and you’re starting to work on a B2B experience for the first time. This will be a very big journey to release and many different functions will need to be tested.

How do you brief testers, when you hand them over a new prototype?

Again, it depends. For simple journeys, where, e.g., it’s just a matter of script, or it’s just about releasing a new offer, I’ll ask them to test everything, as I’ll need feedback on the whole experience. For bigger ones, each single function is tested separately. Going back to the previous example of releasing an altogether new B2B chatbot, I might need to point out that the function of Tariff migration is working properly. In this case, I’d ask to test this function separately and give specific feedback on it, as I know this use case attracts the majority of chatters.

How do testers deliver the feedback?

Everyday at 10.30 am, there’s a standup meeting, where testers will report on what they have tested the previous day. They’ll name the issues they’ve found, update us on who they’ve assigned, or are going to assign, the responsibility to solve the issue to (designer or developer), and the person responsible for it will communicate when the revision will be available for testing again. This way the tester will know when they’ll be able to go back and see how the feedback has been implemented.

How do you keep track of the progress of each item of feedback that testers bring up?

I generally find the meetings are the most important to keep a team on track of the latest and future developments, before the release. Our developers keep track of the issues that arise on their side, through an Excel spreadsheet, so that they can communicate with testers and add comments on the current status. This allows me to have a perspective on the current status of the whole journey, function by function, script by script.

On the other hand, testers and developers also log in their efforts on Jira for their managers to be able to keep track of their productivity.

How often do you go back and update journeys post-launch?

Sometimes that might be needed, but that doesn’t happen very often.

Having experienced testers who have seen a huge amount of journeys and have developed some really meticulous processes will allow you to make sure that what is released is working properly. This does require going very deep with the revisions, verifying word by word, especially, as in Arabic there are very similarly written vowels. If anything comes up, it’ll go back to the designer, who will correct the issue right away.

Going back to what to consider when releasing a new feature: do you have a sort of “production checklist” you refer to?

There are 3 things that I keep an eye on to move from testing into production:

UI/UX, which I work on with a specialized designer, before adding the flow to it;
Scripts, making sure there are no grammatical errors or structural mistakes;
Functionality, i.e., does the bot reply correctly to questions, are the buttons clear, if I press on them, does it take the user where they need to go, whilst also updating the system correctly, etc.

These are the things to test for and go over before making the decision to go live.

‍

Inspired to explore what conversational AI can do for your company? We can help.

Feel free to spill out all your conversational needs and ideas via voice message or good old email.

‍

And if you’re still hungry for knowledge, follow us on LinkedIn for weekly updates on the world of conversational AI, or check out our Testing Bots 101 guide, where we discuss how & when to test a Conversational Interface.