The Amazon Echo, a voice-driven computer that sits on a table top and answers to the name Alexa, can call up music tracks and radio stations, tell jokes, answer simple questions and control smart appliances. Even before Christmas it was already resident in about 4% of American households. Voice assistants are being widely used in smart phones, too: Apple’s Siri handles over 2 billion commands a week, and 20% of Google searches on Android-powered handsets in America are input by voice. Dictating e-mails and text messages now works reliably enough to be useful. Why type when you can talk?
Simple though it may seem, voice has the power to transform computing, by providing a natural means of interaction. Windows, icons and menus, and then touch screens, were welcomed as much easier ways to deal with computers than entering complex keyboard commands. But being able to talk to computers abolishes the need for a “user interface(界面)” at all. Just as mobile phones were more than existing phones without wires, and cars were more than carriages without horses, so computers without screens and keyboards have the potential to be more useful, more powerful than people can imagine today.
Voice will not wholly replace other forms of input and output. Sometimes it will remain more convenient to converse with a machine by typing rather than talking (Amazon is said to be working on an Echo device with a built-in screen). But voice is sure to account for a growing share of people’s interactions with the technology around them, from washing machines that tell you how much of the cycle they have left to virtual assistants in corporate call centres. However, to reach its full potential, the technology requires further breakthroughs and a resolution of the tricky questions it raises around the trade-off between convenience and privacy.
Computer-dictation systems have been around for years. But they were unreliable and required lengthy training to learn a specific user’s voice. Computer’s new ability to recognise almost anyone’s speech dependably without training is the latest manifestation (证明) of the power of “deep learning”, an artificial intelligence technique in which a software system is trained to use millions of examples, usually selected from the Internet. Thanks to deep learning, machines now nearly equal humans in transcription accuracy, computerized translation systems are improving rapidly and text-to-speech systems are becoming less robotic and more natural-sounding. Computers are, in short, getting much better at handling natural language in all its forms.
Although deep learning means that machines can recognize speech more reliably and talk in a more natural manner, they still don’t understand the meaning of language. That is the most difficult aspect of the problem and, if voice-driven computing is truly to flourish, one that must be overcome. Computers must be able to understand context in order to maintain a coherent conversation about something, rather than just responding to simple, one-off (一次性的) voice commands, as they mostly do today (“Hey, Siri, set a timer for ten minutes”). Researchers in universities and at companies are working on this problem, building “bots” that can hold more detailed conversations about more complex tasks, from searching information to making travel arrangements.
Many voice-driven devices are always listening, waiting to be activated(激活). Some people are already concerned about the implications of internet-connected microphones listening in every room and from every smart phone. Not all audio is sent to the cloud - devices wait for a trigger phrase (“Alexa”, “OK, Google”, “Hey, Cortana”, or “Hey, Siri”) before they start passing the user’s voice to the servers that actually handle the requests - but when it comes to storing audio, it is unclear who keeps what and when.
4. According to Paragraph l, the Amazon Echo ________.
A.has been sold out before Christmas |
B.has been used by most American families |
C.came on the market later than Apple’s Siri |
D.is more useful than smart phones in dictating e-mails |
5. What can we learn about computers’ deep learning from the passage?
A.It is vital to accurate identification of human voices. |
B.It is almost the same as the computer-dictation system. |
C.It has helped machines understand the meaning of language. |
D.It has helped machines beat humans in accuracy and reliability. |
6. What are some users of voice-driven devices concerned about?
A.The devices will be in charge of their life. |
B.The devices need to be activated before working. |
C.They are in the dark about their data’s ownership. |
D.Their voices can be recognized by every smart phone. |
7. What’s the author’s attitude towards voice-driven technology?
A.Worried. | B.Doubtful. |
C.Supportive. | D.Objective. |