No Hands

May 29th, 2008

I first watched Apple’s Knowledge Navigator concept video a year ago, and have watched it many times since. Every time I watch it, though, I still get that child-like wonder feeling in my stomach, the feeling that accompanies a rare thought: “This is the future.”

Apple, under John Sculley’s direction, released the Knowledge Navigator video in the late 1980s, yet the device looks just as revolutionary today as it did then. We have realized bits and pieces of it — Apple’s Macbooks and iMacs have built in iSight cameras and microphones for easy videoconferencing, we can give remote presentations through iChat, and have access to a more thorough and accurate encyclopedia than anyone could have imagined in 1987.

But those videoconferencing, remote presentations, and even a huge database of hyper-marked knowledge are not what is most inspiring about the Knowledge Navigator video.1 What is most revolutionary and frankly jaw-dropping is the Navigator’s interface: voice.

The Navigator truly understands human speech. It not only knows the definition of words, and the English language’s grammatical rules, but it can make sense of often complicated speech. The Navigator asks the professor a question, and he off-handedly responds, “Mmmhmm – yes,” and the Navigator has no difficulty understanding his answer. The professor very literally speaks with the Navigator; as he is walking out of the room, he instructs it on how to respond to specific people who may call while he’s away.

There are no commands. There is only human speech.

Unfortunately, we are not even close to true understanding.

But after remembering about Leopard’s new voice, Alex, and having it read a few things for me (which it does quite well), I decided to try 10.5′s speech recognition, an under-discussed feature in Mac OS X. Although voice recognition is not perfect, it is quickly becoming a useful part of how I use my Mac. Mac OS X’s speech recognition has two great qualities. First is it does not require to be trained. You can enable it and begin using it immediately; there is no period for it to learn your voice.2 Second, it is adaptable. Not only can it be expanded (more on this later), but it does not require that you repeat commands verbatim. For example, if you would like to invoke the command “What time is it?”, you can also ask, “how late is it?” This also helps when you mispronounce something, as I found out.

Rather than just discuss what voice recognition can do (which you can find out by opening System Preferences and then the Speech pane), I am going to take you through how I use my Mac everyday, and how speech recognition makes it more enjoyable.

Morning

When I wake up in the morning, the first thing I do is check my email and feeds (lame, I know). I open my Macbook Air and say, “Get my mail.” OS X then opens or switches to Mail.app, and I go through and read new emails. Sometimes I even feel like responding, so I say “reply,” which does precisely what you would expect — Mail opens a new message with everything you would get if you clicked “reply” yourself. After typing out the message, I say “send” and OS X handles it all for me.

After catching up on email, I decide to read my feeds. “Open NewsFire,” and I’m off. After opening the articles I would like to read in Safari, I say “Mark all Read,” “Close this window,” which closes NewsFire, then “Switch to Safari,” which brings up the articles.

Sometimes Daring Fireball gives us a great and lengthy article, so instead of keeping my hand on the touchpad or keyboard, I say “move page down,” and Safari dutifully complies. After clicking on a link in Gruber’s article, though, I’d like to move back and finish his article. “Back” does just that.

I finish Gruber’s article, so it is time to move on to the next item in my feed list. I instruct the computer, “Close this window.” Gruber’s article closes and the next Safari window is brought to the front, ready to be read.

After finishing reading my feeds, I decide I’d like to check the news. I have Yahoo’s home page in my bookmarks bar, so I say “Yahoo” and there it is.

All of this with just my voice. This is not particularly groundbreaking, but it makes my mornings easier and more enjoyable. There is something satisfying about reading and replying to your email, and reading your feeds all without touching the touchpad or using a mouse.

Work

After getting a quick breakfast, it is time to do some work. I am going to need a few apps open, so I say, “Open iTunes,” (work without music? That would be a shame), “Open Coda,” and “Open Preview.” Time to get started. “Switch to Coda” brings it up, and I need to open the Sites pane in Coda. “Sites” brings that up. Now I have my HTML and CSS pages open for a particular site I am working on. After editing a page, I need to make sure it looks correct. I say “Preview,” and lo and behold, it does just that. It does not look quite right, though; I need to fix it. I say “Edit,” and I am back in the edit pane, ready to do some HTML gymnastics.

Well, the page is all done, but I need feedback from the company I am working with. Instead of clicking on Mail.app’s dock icon, then clicking “Compose new message” and typing out everyone’s name, I just say “Mail to [company name].” Mail.app opens a new message with everyone’s address in the To: field. I type my subject and message, and say “send.” Done.

With web design done for now, I need to finish up an article I am writing, so I say “quit from Coda,” and “open MarsEdit.” Ready to go.

Break

Time to grab some lunch. I’d like to go to Panera Bread with Kristen, so to see if she has some time I say, “Chat with Kristen.” OS X opens iChat and if she is online, it opens a chat window with her. Unfortunately, it tells me she’s not online, so I need to call her. It’s been a while since we have talked, so I forgot her phone number. No problem; “Phone for Kristen” brings up a large overlay with her phone number, and you can even have the computer read it to you.

I called her, and she cannot have lunch today, but she can tomorrow. Just so I do not forget, I say “Lunch with Kristen tomorrow,” and the computer adds it to my iCal calendar.

Well, I am off to get some lunch, so I say “Start the screensaver,” and I am ready to leave.

No Hands

OS X’s voice recognition is not perfect. Sometimes, I must repeat a command. But rarely do I need to repeat it more than twice. For the convenience it creates, though, it is worth it; I need to use the touchpad and keyboard much less than I normally would. But more important, it makes using a computer more enjoyable and even stimulating. Rather than sitting in front of your computer and using it silently, you are actively engaging with it. You use a part of your brain you would not normally use while working, and that keeps you more alert and creative.

We may not have Knowledge Navigators yet, but we do have voice recognition built into our Macs which can make work a little easier.

  1. Although its ability to effortlessly access all scholarly articles and, I would assume, data and any other kind of knowledge, implies decentralized systems interfacing with each other over open knowledge-sharing standards, which is an incredible concept worthy of another article. []
  2. If you are interested in how this works, O’Reilly has an excellent article from 2004 on the technical side of OS X’s voice recognition, and how you can use it effectively. []