Since Google demonstrated its Google Duplex capability yesterday, the reaction was overwhelmingly positive in terms of the technology and the possibilities for it as a feature within Google Assistant.
“Allowing people to interact with technology as naturally as they interact with each other has been a long-standing promise,” says Google head of engineering Yossi Matias.
“The Google Duplex technology is built to sound natural, to make the conversation experience comfortable.
”We hope that these technology advances will ultimately contribute to a meaningful improvement in people’s experience in day-to-day interactions with computers.”
Should AI really pretend to be human?
But beyond the wish to create a better experience with Artificial Intelligence (AI), there was some concern that, firstly, such a capability within Google Assistant would make us pretty lazy and secondly, there was significant concern at the potential of Duplex to mislead those who are being called on your behalf.
What wasn’t said during the demo was that it’s perfectly possible that Duplex might admit to the person being called that it is actually a computer calling them. Does it matter if the interaction is as natural as with a human?
That’s open to debate though it was clear from the demo that Google has tried to make the experience as natural as possible, going a bit overboard with filler language such as “er” or “um” in the sample calls.
Such speech disfluencies are used by humans to build in thinking time and that’s also the case here; disguising that the system is still thinking, too. Google adds that while we expect some things to be instantly answered – such as when we first say hello on a phone call – it’s actually more natural to have pauses elsewhere.
“It’s important to us that users and businesses have a good experience with this service,” continues Matias.
“Transparency is a key part of that. We want to be clear about the intent of the call so businesses understand the context.
Google says it will experiment with the right approach “over the coming months”.
Here's the Google Duplex demo in action at yesterday's Google I/O keynote talk:
Can we trust AI yet?
Another problem with Duplex is that our experiences with virtual assistants and other voice control systems have led us to mistrust them. Or, at least, not trust them completely.
There’s the obvious concern that you might not get the result you wanted from a virtual assistant you’d tasked to book a table for you. Would it be at the right time and even in the right restaurant? Where the system detects that it hasn’t been able to get the desired result, Google’s idea is that it will be honest and flag this to you.
While there’s no rational reason these details should be wrong, the temptation as a human is to be mistrust that a virtual assistant would be able to get everything right – would it really be able to interpret the nuances of language that accurately?
Google argued on stage at Google I/O and again in the supporting Google Duplex blog post that the idea behind the system is to carry out very specific tasks such as scheduling a hair appointment or book a table. Unless trained to, it can’t suddenly call your doctor and start to have a chat.
This stuff is actually pretty complicated
Natural language is hard to understand, while the speed of conversation requires some pretty fast cloud computing power. People are used to having complex interactions with other humans that, says Matias, can be “more verbose than necessary, or omit words and rely on context instead. [Natural human conversations] also express a wide range of intents, sometimes in the same sentence.”
Google says that other challenges to the technology are the background noise and poor call quality that is a hallmark of many phone calls plus people tend to speak quicker if they’re talking to another human than they would if they thought they were giving voice commands to a computer.
Context is all-important, too, of course, and we tend to make contextual connections that computers traditionally don’t. So during a restaurant booking, the human might say a number which could mean the time or it could mean the number of people.
Google says it is combatting these challenges with the use of a recurrent nural network that’s idea for a series of inputs as you would get during a phone conversation. The system still uses Google’s own Automatic Speech Recognition (ASR) technology and layers on the nuances of that particular conversation; what’s the aim of the conversation? What’s been said previously?
What are the benefits of Duplex?
There are several benefits to the Duplex technology, argues Google (beyond helping out busy people). Firstly, it could benefit businesses who don’t have online booking systems as users can still book appointments online and they will also get reminders about that appointment from Assistant, leading to fewer missed appointments.
Secondly, it could make specific local data online more accurate. Google cited the example of store opening hours in Google search at special times of the year. Could it ring up a local shop and ask for its Christmas opening hours, for example?
Google says that current human-computer voice interactions don’t engage in a conversation flow and force the caller to adjust to the system instead of the system adjusting to the caller.
And, of course, it could help those who have difficulty using the phone because of a disability.
Google says it will start testing the Duplex technology within Google Assistant this summer, specifically for booking the types of appointments outlined above.
Liked this? Check out Google Assistant Easter Eggs: Your complete guide to funny Assistant commands