Design: What’s Important For The Future of Voice Interfaces

A couple of days ago, I got me an Amazon Echo. The device does an impressive job at showing me how limited today's voice interfaces are. That needs to change.
Pretty Box, Best Used as a Speaker. (Photo: Amazon)[/caption]
Here, I just want to mention that the Google Assistant wins any competition in that regard. Siri has a similarly horrible error rate, although Siri is the oldest technology of the three, so it should theoretically be the most advanced.
It sure is Pretty. And Did I Mention That Music Sounds Really Good? (Photo: Amazon)[/caption]
I also can't talk the way I want to. Alexa needs a sentence to be pronounced the way she wants you to. Otherwise, she simply won't understand the command. The developers have prepared some alternative variants for typical input. But you still need to know them to get Alexa to reply. This is pretty nerdy, and very close to a toy. I guess there's a reason why setting a timer is one of the most used Echo functions.
So Far, Voice Assistants Are More Like Information Systems, Such as Encyclopedias or Dictionaries. (Photo: Google)[/caption]
Transferred to the voice assistant, this would mean that it needed some kind of short term memory in order to be able to handle information and context productively for a limited time. With Siri and the Google Assistant, we can already see early stages of this, when it comes to sending a WhatsApp message, for example. Here, the assistant guides us through the process.
Today's Voice Assistant
"I'm not sure." "I don't know that." That sort of cluelessness is what I hear from Alexa, the voice assistant of the Amazon Echo, most of the time. This may be unsatisfying, but basically, it's only a problem of the available knowledge base. Add more substance to Alexa's backend, and this kind of answers will become much less common. Thus, I don't want to continue focusing on the lack of knowledge in this article. [caption id="attachment_102951" align="alignnone" width="1000"]
Voice Detection Works Perfectly Fine
The actual issue with Alexa and all other voice assistants is a different one. It's not about what used to be the biggest issue, the voice detection in itself. Thanks to fast cloud connections and sufficient computing power, detection itself is near perfect. Even the differentiation of different languages in one sentence is no longer a problem for the technology. The next big construction site that limits the success of the voice technology in the form of conversational interfaces is the concept of conversation in itself. Honestly, I have to admit that using my Echo does not really resemble a conversation.The Interaction Resembles the MS DOS of the Eighties
Alexa reminds me of the early MS DOS. As if I was using command lines, I pull one sentence after another out of the slim can. I always start my commands with the word "Alexa". The Echo doesn't respond to other requests. Without "Alexa", there's not much going on. This is different for some external skills. I'll admit that, but the basic functionality is not smooth at all. [caption id="attachment_102948" align="alignnone" width="1000"]
The Voice Assistant of Tomorrow
In this article on t3n, I presented Conversational Interfaces as the dialogue systems of the future. In this and this article at Noupe, I looked into storytelling as the most important design element of future generations. I estimated the time until the switch from purely visual to voice-oriented design to be about ten years long. Looking at Alexa, I may want to correct myself to 15 years. At least... Let's get back to the previously mentioned big issue in conversational design via voice controls. Complex processes, such as the purchase of a product that can be configured with different options, can not be taken care of given the common usage strategy. Here, an actual conversation with the technology is needed to achieve any results. The voice assistant has to be able to cause conversion. At the very least it shouldn't be an obstacle to that goal. At best it actually positively supports the process. I have yet to encounter that voice assistant that keeps me motivated and dedicated. Creating that assistant will be a lot of work for designers. It's somewhat similar to designing a longer form. Both come with the danger of the user exiting at any time. The determined cancellation rates of common shopping cart strategies prove this. It's also important to build a connection between the user and the system. At the lowest level, this connection starts with the system not querying data that it already knows.Context is King: The Voice Assistant Needs a Short Term Memory (At Least)
Let's imagine talking about Paul from the HR department with a colleague. After the first couple of sentences, we'll only use "him, he" to talk about Paul, without ever pointing out that we're still talking about Paul. Throughout the conversation, we'll build a context that we assume is known in following sentences. This way, we even understand subtle innuendos. Ten minutes later, we've probably forgotten our conversation about Paul. [caption id="attachment_102950" align="alignnone" width="1024"]