A quick guide to getting conversational user interface right
2016 was dubbed as the year of conversational systems and natural language interface (NLI). Without a doubt, there is significant appeal in being able to address a machine using the same natural language we are used to in everyday human to human interaction. The exploding popularity of mobile messaging apps and advances in A.I. technologies, particularly Natural Language Processing (NLP) and Machine Learning have progressed to the point where this new user type of human-machine interface is on the cusp of mass-market adoption. Siri, Google Now, Alexa, and Slackbots are a few high-profile examples of voice or text-driven experiences that have already gained a good level of real-world use. These and many other AI systems are now open and mature enough, to allow companies and enterprises to begin interacting with customers or exposing internal process through natural language.
When designed right, conversational solutions or natural language interface based systems, a.k.a 'Virtual Agents' or just 'Agents', provide several concrete benefits that are not possible with conventional WIMP ("windows, icons, menus, pointer") interfaces. Its a much deeper engagement and never before insights can be obtained from going through customer conversation logs. Data presentation can be focused and highly contextualised unlike traditional "dashboards" and with combining machine learning, these systems can get better as they interact more with the user. Interacting using natural language, whether voice or text offers another level of speed and immediacy. Quite simply, we have the ability to communicate more quickly by talking or typing than by clicking or swiping through screen interfaces, no matter how simple those interfaces may be.
I have had the opportunity to conceptualise, design, implement and validate conversational solutions, from primitive rule-based and scripted 'bots' to advanced AI based agents. Based on my experience, I have attempted to capture some key design principles that one should consider as they embark on this new era of conversational solutions.
First things first - the 3 golden rules
Don't fake it to kick it!
Don’t pretend to be human. Two things can happen from this, a) the user never finds out for the most part of the engagement and if at all they do, then they feel like they don’t understand how system works or b) they feel deceived by talking to a machine when they were not expecting to - Both of which are bad user experiences!
Also, it's important to know when to escalate or 'hand-over' the engagement to a real human and there must be an option for the users to opt out and go straight to the real human ("I'd rather talk to a real human, can you please stop and put me on to someone real?")
Tell what you know (and what you don’t)
Humans let people know what their expertise is and how they can help and when they cannot. Conversational solutions should do the same.
Every time I have got a user to test a conversational solution, and in spite of clearly explaining the purpose of that agent, the users just cannot help themselves from asking general (out of domain) questions. There is an inclination to assume more intelligence in an agent than there is because of the fact the interface and engagement are so different to what the users are used to and of course, the lack of understanding the kinds of AI (general-narrow, weal-strong). It's Ok to let the user know (even repeatedly, if you have to) that what they are interacting with is not general, self-aware AI, but instead an agent that is specifically trained on a certain domain.
Some 'small talk' ("Are you for real?", "I am feeling tired") is good and in fact almost necessary, but know where to draw the line.
KISS - Keep it simple silly!
Conversations and dialogues should be real simple, bounded to very particular topics and follow a linear conversation pattern as much as possible. Interactions should be short and precise and should have mechanisms to avoid getting trapped into back and forth conversations. Limit to not more than 2 sets of back and forth when there is no need to get into deep dialogues (decision making, choosing from multiple options, multi-level disambiguation, etc;). Its also important to note that majority of the users will be using a mobile device, most likely a phone to access the system and therefore the text responses must fit within in the screen, allowing the users questions and response to be on a single screen. Scrolling through long answers is a bad conversational experience
Define Purpose - What, Why & How
Firstly validate the use-case for the conversational solution. The perfect conversational use-case is one with similar questions and frequently asked by users and where questions are not ambiguous. It must be acknowledged that conversational solutions just does not work in certain cases, like for example news (scrolling and clicking headline on a website or GUI app is much faster) and other interactions where the options to choose from is big and decision making is complicated.
The next thing is to define the purpose of the solution and thereby identifying the end goals upfront and also have some kind of measurable outcomes. Is the solution expected to increase sales, or decrease customer contact centre costs or provide concierge support or increase usage of a website? (Agent-based sales should account for 20% of total sales", Agent must handle 80% of concierge support)
Some key points to flesh-out upfront:
Informational or Transactional
This is a decision to be made upfront upfront and will determine and dictate how conversation flow design will be carried out. Will the solution be used as an informational agent (hotel concierge, student assistant, tourist guide, football coach) or will it be used for a transactional purpose, to carry out a specific action (order Pizza, Log Incident Ticket, sell t-shirt)?
Example: the pizza ordering agent (Transactional) must have mechanism to capture all critical inputs to complete the order (entities)
Who is the system representing? Is it an individual (Tax Advisor, Salesperson, CEO) or a team (Finance, IT Help Desk) or a department (Customer Support, Student-Faculty) or a company (Skyscanner, MebourneWeather). Also determine if the agent should speak in first ("I can certainly help you with that query.."), second or third person narrative ("FlightBooker can help you with that")
What role is the solution expected to play, how much of that is expected to be fulfilled by the agent and which needs escalation to a real human? This is almost like writing a Job description to hire a real human to do the same job. As any job description, it should have Duties & Responsibilities and Performance Goals.
Proactive or Reactive
Determine how proactive or reactive the solution must be. A proactive solution is designed very differently where the agent actively seeks or queries the end user for more information or requests for certain actions. Example, an agent who's purpose is to actively sell or cross-sell a product should have a high degree of pro-activeness ("I see that you are struggling with options, can I help you decide?", "Thanks, would you like matching shoes with that dress?", "What else would you like to know?").
Reactive conversational solutions, on the other hand, are designed to only respond to queries and most times fall into the informational agent category. Many a time a hybrid of pro-active and reactive agents works best. Even in an informational agent case, there will be times when the agent needs to pro-actively seek additional information to provide a certain answer and in the case of a pro-active agent, being pro-active all the time may leave the end user with a "pushy salesman" experience.
Create the Persona
The core part of a conversational solution is to figure out who or what that thing is that is going to hold conversations with your end-user. You'll need to also figure out what tone of voice makes sense for your agent to talk with and what that tone is backed by. This depends on several factors like how much you’d want to humanise your conversational system, how intelligent do you want it to appear, how formal or informal should the conversations be. The tone and personality of the conversational system will have a great impact on how successful and how well the solution is accepted by the end-user.
The personality and tone must reflect the brand the system is representing and must also take into consideration the target audience (end-users) and the nature of the questions that will be asked.
If you don't already have a character design to work with, a basic sketch of who's talking may suffice. It should include some keywords and what kind of things this person will or won't say, in order to provide a rough picture.
Contextualise - Know your user
Contextualisation is one of the most important factors to keep the users engaged and gives a notion of intelligence to the end-user. Example : if a student is using the University's Virtual Assistant solution to find his way around the campus, then knowing which one of the several university campuses that the student is logged in from can present very different (and contextualised) responses - "Where is the Library?"
Determine upfront what context information is available to the user or which be captured easily at the beginning of the conversation and write answer variations based on that context information.
Writing Conversation Flows
Greet & Introduce
The Conversational system must have a simple and effective greeting statement and it is always recommended to suggest a next step or present a 'call for action' as part of the introduction greeting. This holds true even if you have decided to keep the solution "reactive". The introduction should also be able to convey the purpose of the conversational solution.
Validate Input to be sure
Users of a conversational system are free to and often do use different words to describe the same thing and its always good to validate the inputted data, particularly when you are trying to capture critical input/entity. Repeat it to ensure that everything is correct, and then move on to the next question.
"I want to leave Melbourne tom"
"Got it! Flying out tomorrow, the 6th of Feb. And when would you like to return?"
Disambiguate carefully, but quickly
Disambiguation is required when the inputted data is either partial or matches to more than one intent. Disambiguation may require you to get into a dialogue until the intent is matched. Ensure disambuigation dialogues do not run deep and have mechanisms to exit deep conversations and provide options that the users can click.
Example : "Did you mean:
- Paid Internship?
- Unpaid Internship?
- Volunteer work?
Capture Critical Inputs
List down all the critical inputs that is needed in order to complete a certain transaction or before certain advice is provided by the conversational system. Having all this captured in the design phase and working backwards from there to figure out how to gracefully obtain these from the user is often much easier. The crticial inputs needs not be captured in a certain order.
Example: if the Pizza Order solutions need Pizza Size, Flavour and Delivery Address, these can be captured in any order based on how the conversation goes.
Use Rich Media but sparingly
Many conversational platforms support some level of rich media in the messages. This is something that requires special consideration when thinking of a content strategy for conversations. The media types may all not be applicable for all use-cases but the quick response ones, like Yes/No buttons and Option Cards (colour variations for a product) work really well, especially for critical input capture and for disambiguation flows. Video and images (especially large ones) are rare and may not work well in most cases. They could interrupt the conversational flow. Emoji's and animated giffs can be used but it depends on the personality of the agent.
Get User Feedback
Not many solutions out there, even mature and full-fledged conversational systems lack a feedback mechanism. This is an important feature, at least during the early *Training* stages. This feature can be incorporated in two ways. 1) as Like/Dislike button, part of the conversational UI (thumbs-up/thumbs-down) or 2. as part of the conversation flow itself ("Did that help?" / "Is that what you meant"). Either way, there must be a means to identify these from off-line user conversation logs so that it can be fed back into training or learning.
One of the most critical elements to creating a great conversational solution is copywriting. The right words can keep your users engaged for hours while the wrong ones will leave them running for the hills. The writers must understand how to write engaging, emotional copy that draws users in. Conversational systems reflect the communication skills of their makers. are the new designers Copy writers
One thing that has worked in my experience is for the writers to think like they are writing a screenplay. First design the characters and have those characters interact with one another in ways their audience will find engaging and believable. This doesn’t mean they write emotionally charged moments but instead focus on micro-interactions.
Take with a grain of salt
Hope this helps in designing your next conversational solution. There isn't any set-in-stone design methodology for conversational solutions and what we know is constantly changing based on new understanding and human behavior. Although messaging and conversational systems adoptions are exploding, we are still in very early stages of applying this new way of interaction. What I have tried to capture here is what I have learnt from my first-hand experience in building conversational solutions for large enterprise clients and of course, from referring to other people's experiences and their lessons learnt. Not all of what I have written may be applicable or relevant to every conversational use-case out there and certainly, there are things that I may not have captured or come across as yet.