7 days living with...voice control
"These aren't the droids you're looking for, move along." No, not that type of voice control – we’re talking artificially intelligent gadgets rather than Jedi mind tricks here.
The tech developers of the world would have us believe that dictation software is pretty polished. We’re all packing smartphones with Siri, Android voice commands and all sorts of other speak-and-spell apps, and we’re told that the next generation of smart TVs and games consoles will be voice activated and, indeed, some already are.
So, for me, a journalist with a slight fear of creeping tendinitis, I should really give it a try but is it up to scratch enough for me to get through a working week? Here is how it unfolded with my seven days with voice control.
It’s 9am - time to boil some eggs for breakfast. I boil the water, dunk the eggs and pick-up my iPhone 4S.
“Set alarm for four minutes,” I tell Siri.
“Alarm set for 09:02,” it replies.
That's not what I asked. I try again. It persists, this time (am I imagining it?) rubbing it in.
“Alarm set for 09:02 - pretty soon!”
My eggs have now been boiling for about a minute, so I quickly return to old (ish) ways by firing-up the built-in Clock app. Not a good start. "Set timer" is what I should have, said, it turns out. Doh!
Time to do some writing, er, speaking. I've already loaded Dragon Dictation for Mac 2.5 on to my ageing iMac in readiness for this week. It took about 15 minutes in which I had to read a few passages of text on the screen while the software got used to my voice. I don the headset and feel like I'm in a call centre.
After opening a Word document, I start dictating a TV review from some notes made the previous day. The hardest thing to get used to is punctuation COMMA which has to be voiced SEMI-COLON it just doesn't seem natural at all FULL STOP. In my debut opening paragraph there is just one error, "remotely" transcribed as "the mostly". Wow.
With straight text, this stunning success rate doesn't last. For starters I can’t seem to "wire" and "fire" unless I pronounce it "vee-a" and brand names aren't always recognised. I attempt to compare a TV with a Samsung model, but "some sun" and "Saint-Saëns" is all that comes out. I utter the "scratch that"instruction for the first time, which instantly wipes what’s just been transcribed. Where's my keyboard?
Later that day, I walk to the swimming pool, and have an idea for a feature. Usually when this happens, I stop walking, and send myself a quick email, but instead I attempt to dictate into the Note-taking app. I press the microphone icon to engage Siri, but wait until some passers-by are behind me until I start speaking. It takes at least 30 seconds before producing anything, and although it gets the words spot-on, it doesn't appear to understand punctuation. I love punctuation, but I hate talking to gadgets in public places. It’s all very weird.
Siri’s accuracy yesterday persuades me to give it a go for emails, so I check, and start replying to my inbox while I’m still in bed. It goes smoothly. I discover that the punctuation - "new line", "comma" etc - in Mail are the same as I’ve been using for Dragon, and I get through five or six emails before work. It doesn't save me any time though, and I realise I’ve ruined the one sanctuary I had. I decide only to use Siri for dictation in dead time, and later on reply to an email while waiting for the kettle to boil. Much better.
Today I have a lot of research to do, and that means hitting Google. Usually it’s a frantic copy and pasting of product model numbers - the scourge of any tech journalist - but today it’s different since I’ve decided to perform every Google search with my voice. Cripes.
I try "Toshiba 40RL868 price". Amazingly, Google hears me correctly and within one click I’m on a dealers' website with the latest price showing. Brilliant. My next task is to find out some background about Songdo, a newly built "smart city" in South Korea, for another tech feature I’m working on. Here goes."Songdo South Korea" elicits the rather amusing search terms "Syndrom", "Condos" and "Phone died South Korea" before I give up and have to type-in the name using the virtual keyboard. Some definite limitations here.
Tonight I’m going out with friends, but need to check on the exact location. Google’s app answers the question "Where is the Aneurin Bevan Pub Cardiff?" by directing me to… ironmaiden.com, providing search results for "when are Bedford Par"’ and "cameron bergen park". Google doesn't know where I am, so can't cross reference anything with what's around me, but that's not the main problem here - it has no idea what I'm talking about.
Once at the pub, I use the purest type of voice control to order a round of beers. While standing at the bar an email comes through that only needs a short reply, but I can’t muster the courage to reply by voice. Not in a pub. If they didn't chuck me out, I'd leave in disgust. More limitations, it seems.
Up early today and actually rather excited about the speed at which I'm beginning to produce features. Maybe if I can get today's writing done by lunchtime I can have the afternoon off?
After reminding myself to slow down while speaking, for the benefit of both Dragon and the Google app on my iPhone, a feature is finished by 11am. However, instead of quitting work for the week I decide to investigate Dragon’s other dimension - Commands - which I haven't yet had time to learn.
I notice that one of my reviews has just been published online, and I decide to Tweet it. I link it to my Twitter account @jamieacarter, but unfortunately Dragon’s "Post that to Twitter" command doesn't help; after highlighting some text, this time, saying those words populates a 140-character box with a Tweet button. I need to include hyperlinks etc, so it’s not much use for me, though for anyone wanting to tell the world a random thought, it’s a completely hands-free experience.
Later in the day I finally get round to a task that’s been on my "to do" list for yonks, and utter "electrician" in earshot of my phone. As a result of yesterday’s searches, Google now thinks I’m in Bedford Park, which happens to be near Adelaide in South Australia. I’m given "local" maps and a Wikipedia entry. The "use my location" button within the Google app appears to be dead, so I enter my postcode and get given the names and one-touch phone numbers for 10 electricians in my area. Yes! We got there in the end.
I’m thinking of heading out to the Brecon Beacons tomorrow, but I’ve heard rumours of snow. "It’s not looking good in Brecon today through this Tuesday," confirms Siri, laying out a six-day forecast that confirms that, yes, it’s snowing. All weekend. Ah well, I guess I’ll watch the Chelsea vs Manchester United instead. That’s on Sky, right Siri? "There’s some bad weather coming up for Manchester this Sunday," it tells me. That’s about as far as we get.
With Siri not only playing-up but also completely devoid of location-specific information, I chance upon a review of the Evi app, pay the 69p fee and download it to my iPhone.
I fire it up the and ask "her" a few questions. Presuming she knows where I am, I say:
“Find me a fish restaurant.”
She comes back with an unsatisfying, “Try opentable.com for restaurants.”
I try again, this time more specific. “Find me a fish restaurant in Cardiff,” but all Evi can muster is links for yell.com, yelp and 118.com. I give Evi a virtual thumbs-down, and it replies,”Cringe! I really hate letting you down!” Cringe indeed. I revert to speaking into Google on the iMac using the Dragon headset, and find a load of choices in seconds. Done.
As 5pm approaches, I try to check the football scores but yelling "latest football scores" into Evi produces just a "try goal.com for soccer" Soccer? Soccer! I hit the link and almost instantly get the scores.
Since it's Sunday, I don't sit at my desk and don't interact with gadgets much at all until 3pm, when I switch on my Xbox 360 in anticipation of Chelsea vs Man Utd on Sky Player. Watching Sky is what I mainly use my Xbox for, and since (decent) footie games aren't Kinect-compatible, I’ve never been much interested in buying a Kinect, but I have borrowed one from a friend for the purposes of this experiemnt.
I'm not a big fan of the new GUI on the Xbox, and nor does its dynamic, ever-changing content seem all that well suited to voice control, but the basic voice controls "music works Okay alone).
Lovefilm seems to have the most voice-friendly design: "Xbox last chance", "Xbox most viewed", and even "Xbox (title of film)" works fine, as well as "Xbox fast forward/rewind/pause". I start watching Source Code without touching the Xbox controller once. Even better is LastFM. I skip through various tracks and playlists purely by voice instructions while reading a magazine, but the Kinect then seems to go deaf. Too much background music?
"Xbox Sky" launches Sky Player for the start of the match, and after the footie is finished, I shout "Xbox play disc" and the in-situ Pro Evolution Soccer 2012 loads-up. I mournfully return to my controller, and I’m reminded that voice control is in its very early days if it can’t even manage the likes of "pass", "shoot" and "two-footed tackle with studs showing".
This week is stacked with work, but I’m not sure what to prioritise. "What am I doing this week?" causes Siri to produce a complete list of my appointments, though it would help if I started putting everything in my schedule - you only get out of Siri what you put in. So I take five minutes to tell Siri all about the deadlines I have in my diary for the rest of the week, just by saying "schedule", "deadline", telling it the time, and then changing the title of each entry to relate to a specific piece of work. It’s a bit stop-start, but far quicker than using Google Calendar on my iMac (which is synced with my phone).
I fire-up Dragon, then the phone rings. I answer it, come up with a brilliant idea for an editor (probably) and return to my work. What's this on the page?
"Fpblurred and you are in a while a sunglasses and a are you a year is a is a disease there are 'sntary and, at just 1no whizz with it."
What the ... ?!?! I forgot to turn off the microphone, sure, but I don't remember mentioning sunglasses or diseases. That deleted, I finish two features I’d not expected to, though my proof-reading is done in a rather paranoid state. Actual "writing" is definitely quicker, but editing is taking longer.
A pesky cheque has appeared in the post and, although I know perfectly well where my bank is, I decide to voice my plans. Evi didn't impress me on Friday, but Siri is no help on matters of UK geography, so I give it another try - and it's just as well because "Where is the nearest Lloyds Bank?" prompts not only a link to a Google map but also a spoken address right down to the postcode. Impressive, though there’s no device control here, so I can’t use Evi to send a text or make a phone call.
Just before end of play someone from a PR agency rings up to offer me a loan of a pair of binoculars for a future article on Pocket Lint, and she e-mails me a loan form. In complete silence, I then download the file, print it, sign it, scan it, attach it to an e-mail, and send it back. How awfully manual. If only the printer could understand me.
I still can’t get used to switching off the headset when I’m not dictating, resulting in another abstract gem on the page, this time where a headline ought to be: "handy ladybird left the surgery is methodical and tonic productivity you come to see that were widely crap gadgets." Lovely stuff.
Did speaking rather than typing save time? No, not really. Although I frequently managed to rattle-off at least six paragraphs without taking a break, the necessity to go back and alter numbers and the occasional punctuation mistake - not to mention the general editing process in terms of content and style - means it takes roughly the same time to produce the same amount of work.
However, speaking did change the tone and style of what I produced. The first rule of journalism is to avoid clichés but, reading back through my spoken reviews, I noticed a few have crept in. They were hardly peppered with "at the end of the day" and "like", but it's definitely closer to my speaking habits than to my writing style.
Siri is a good personal assistant and just about reason enough to upgrade to an iPhone 4S; ditto Evi, which complements Siri fairly well for now, though both ought to drop the character act. Sarcasm from a glorified search engine seems just plain odd and begins to wear thin very quickly.
The Kinect, we liked, but until voice control is integrated into top-line games it's only purpose can be convenience. I’m not convinced it achieves that yet, though flicking through music on LastFM while making dinner is a good start.
For now, it's the highly accurate voice recognition software from Dragon that impressed us most during the 7 days which saw our keyboard gather dust. The rest, so far, is a stop-start novelty.