The Future of Voice, Speech Recognition by Lance Winslow

The Future of Voice &
Speech Recognition

by Lance Winslow

Experiencing the new modern paradigm shifts in technology will require humans to become one with the technologies they create and the ability to interface in real-time. Perhaps, the greatest step towards this goal is Voice Recognition Technology, where humans can talk and communicate in a way that is natural to them through their evolutionary process - vocal cords and speech. Thus, they are able to interface with the tools they have created.

Today, when you buy a new computer with the Microsoft Vista Operating System pre-loaded, it comes with Windows Speech Recognition, but even if you have an older computer, there are many good Voice Recognition products available such as Nuance's Dragon Naturally Speaking. Currently, this article is being written using the version 8.1, soon, I will upgrade to the next version 9.1, said to be even more accurate, within 99%. This is a huge improvement from my first try at speech recognition software, IBM VoiceType Dictation 3.0, I purchased back in 1995, twelve years ago.

Over the past couple of years, I have indeed worn off the letters on three different laptop keyboards, perhaps I do not trim my fingernails as often as I should, or perhaps it has something to do with the fact that I write 4,000 to 14,800 words a day, pounding out articles on those plastic keys. Either way, for me Voice Recognition Software ranks up amongst the top greatest inventions of mankind. Technologies that increase productivity and efficiency are the most significant and at this point, I give thanks to Ray Kurzweil for his contributions to Voice Recognition.

Voice Recognition Software has enjoyed good R and D Expenditures and the number of applications that entrepreneurs are finding for this technology have also fueled that fire going forward. You only need to read through a few issues of Speech Technology Magazine to get an idea of how fast things are moving right now. There are rapid advances and uses of this software in nearly every major Industry:

· Transportation,

· Communication,

· Energy,

· Education,

· Military,

· Mining,

· Manufacturing,

· Policing,

· Prisons,

· Courts,

· Construction,

· Disaster Relief,

· Space

These industries use these Voice Recognition Tools for Corporate Customer Relations, Training, Design, Project Management, Public Relations, Advertising, Word Processing, Data Mining, Writing, Translation, Recording, Machine Interface, and that is to just name a few. These applications in Voice Recognition have improved efficiency, saved time and that translates, at least to the corporations that employ the technology, into quarterly profits and improved shareholder's equity, its all been very well received.

No, it has not been perfect, yes, there are kinks to still work out. There are many different dialects, regional variations of accents, and countless languages, some rather obscure. Indeed, there are shortages of top-notched specialists in the field. But, Voice Recognition is crossing the digital divide and preventing unnecessary political impasse, that means fewer conflicts, fewer wars and a safer world as well. Can one technology really do all that? Yes, the United Nations is also employing these tools for that very important common cause of humanity - peace.

Corporations are now using CRM Voice Software that can pick up customer intent, emotions and feelings, just by studying the patterns in the voice, pauses between words, and the voice inflection of those words spoken. This means that someone talking to a call center computer is completely understood and the company can derive more information about satisfaction levels, customer service and improve the quality of their products and services faster. Politicians are also starting to use this for constituents that call it. Of course, these technologies have great value to the military for threat assessment as well, protecting human civilizations from terrorism.

The voice recognition software programmers and engineers have devised way to give each word, phrase and sentence, values of intensity, volume, and variation, but obviously, you can see how quickly that can complicate it self, not an easy task. However, in doing so voice recognition can also get very close to identifying the unique speaker. This has applications as well for training the software for accuracy amongst users, switching users without being instructed and bettering accuracy with unknown users. Each year voice recognition has improved - the future of voice recognition is here.

The Voice Recognition software is slowing replacing court reporters, an endeavor with huge staffing shortages in many areas. Now some courts are using Voice Recognition instead. Police Departments and US Troops in Iraq are using Tablet PCs and PDAs that can interpret what's said and deliver that sentence in voice to the other party in their language. When the other party replies, their words are translated back into English and written on the screen.

Voice Recognition lends itself well to One-on-One instruction for students using Avatars to help with teacher shortages, help special kids and assist in computer training. Corporate Training, as well is a big application, allowing complete interface, with the trainee or student, which is perfect for educational purposes. And these are only some of the applications that are currently in use or development right now. The question is what lies in store for the future? Lets look 5-years out and see the road ahead and ask ourselves what else do we see just beyond the horizon.

It is time to start discussing some of the potential future applications or 'killer aps' that are now possible or will be shortly, along with issues of funding the research forward. We must consider how other complimentary Artificial Intelligence advances will hyper-advance Voice Recognition performance. If you have ideas or concepts for the future of Voice Recognition, we should talk. Below are just a few of the concepts that come to my mind while thinking on this subject today, perhaps you have some too. Indeed, there really is no end to the potential, and we are only limited by our imaginations.

Future Advances for Voice Recognition:

1. Body Language + Facial Expression + Voice Recognition:

Currently, there are robotic android projects in the works in Japan and in the US; facial expression or mirroring, is very popular. The goal is for the human that interfaces with the system to create an emotional bond with the machine. Voice Recognition systems that also read body language and facial expression can also be used for threat assessment at lets say airports, border crossings and replace human workers at those locations or choke points.

If you smile at a robotic android and it smiles back at you, while you are having a conversation, this ups the emotional value of the conversation to the human. Perhaps the system might start complimenting you. If you are persnickety to the system, maybe it will mirror those responses or reciprocate an angry response or work to diffuse the situation, of course it all depends on its programming, but you can see the advances, potential applications and the trends going forward.

If you will recall Hal the famous science fiction computer, it said: "I sense hostility in your voice Dave." Perhaps since this was once in a science fiction work, human scientists today are trying to make it so. Right now, we are there, with have this technology, CRM Voice Recognition Software can sense emotion, hesitation, aggression, hostility, anger, etc. So, within five years we will see these features in more and more applications.

Haptics is another field of science, which lends itself well to the merging of Voice Recognition and Facial Feature emotional recognition. Perhaps robots of the future will look like humans and mimic their characteristics. A robot that feels a strong handshake and firm grip along with a self-confident voice of an individual with an earned ego might elevate the trust factor a notch or two.

Sizing up the confidence in the individuals ability to perform - will voice recognition software combined with these other technologies replace corporate Human Resource Directions, Project Managers, Middle Managers and CEOs too? Folks are already thinking here; 10-15 years out, but not without ruffling a few feathers. Being replaced by a computer, robot or system has caused many a conflict in the past, so the plot thickens and more barriers are foreseen.

2. Emulation of Emotion and Empathy:

Emulating emotion and empathy is on its way right now. Currently, most consultants of artificial intelligent customer response systems for 'call centers' advise that the voice on the other end if coming from a machine, should be easily identifiable by the human calling in as a computer systems with voice recognition features, because humans do not like to be tricked, when they find out, it makes them upset. Of course, with the advent of emotion emulation or empathy it is possible and we have the ability to do this now.

Indeed, artificial intelligent computers have been used to go online and participate on forums and can participate for 15 threads or more, without detection. In voice recognition, if the voice sounds legitimate, a full conversation can go on for a while, without the human realizing it is talking to a machine.

With a call center system handling a complaint, a computer system might side with the customer and listen to them and even say;

"I know how you feel, I am so sorry this has happened, let me see what I can do" or;

"yes, I understand, this is very urgent, let me have you talk to my supervisor"

then pass the customer off to a real human or perhaps another voice system, with a more authoritative voice? The customer on the other line may never know that they are talking to a computer or computers. Indeed, this does not sit too well with many in the industry but it a place where software professionals of voice recognition are thinking and discussing now, surely you can see applications for this.

For instance; think of how such applications lend themselves well for Crisis Hotlines, Online Self-Help Websites or even Computer Systems to assist At Risk Kids? What about the Catholic Church confess to an AI Priest, and keep your secret safe (just kidding) - But, who knows what applications folks might come up with for these advances in emulation of emotion and empathy?

3. Understanding a Joke and Responding with Another One:

Artificial Intelligence is getting better all the time, soon, AI software engineers will create joke recognition systems, where the computer will understand irony and know when the human is telling a joke, then reciprocate with a joke of their own, perhaps creating a joke from scratch. The system would be pre-loaded with all the jokes common to human interaction in all cultures. It will be able to pick one that has most likely not been heard by the human they are working with at the time; also put in memory that it has been told to that individual so it does not repeat it.

Wow, this is getting complicated fast isn't it, and this is exactly why it has not been fully achieved. Humor has been a huge stumbling-block for human voice recognition and artificial intelligence systems, yet it is something that humans have a knack for. Still, they are working on this challenge and we will see it within 5 to 10 years, the AI software folks will have that problem licked.

This will mean advances for human companions for long-term space flight, help with rehabilitation and ease the tension of humans working along side robotic partners or assistants, as the transition of robot and human workers takes place. Since robots will be working with and assisting humans, it will be necessary to keep the peace to foster cooperation.

4. Vocal Cord Vibration Recognition + Current Voice Recognition

Currently, there is advanced research in the US Military that allows vocal cords to be read, without actual speech or voice, these systems are working now. This is done with a device near the larynx that picks up sensitive vibrations, which is coupled to a transmitter for sending. The receiver or other special force member has a tiny ear piece so they can hear that speech, all silent to those nearby, within six inches of those using the system. This is getting pretty close to mimicking thought transfer, but in essence it is a form of voice recognition, hooked to a communication device.

These systems will get much better and soon the secret service members, special forces, SWAT teams, will no longer have little cords coming out of their ears, but they will communicate without notice. The larynx vibrational speech recognition might be mounted inside a "clip tie" and no one will be the wiser. There are many applications for this if you think on it.

Applications:

Today, in considering this question, I wrote down some potential industries and uses where these technologies will be needed and desire, which would warrant R and D expenditures. Some of these ideas are borrowed from general knowledge, articles, papers and/or think tank conversations, still others off the top of my head. These are merely just the natural progression and evolution of voice recognition. The bigger question is where do YOU see the voice recognition future - What say you?

CAD Design Assistant

Cell Phone w/Interactive Features

Communication with Dolphins

Death of TV Remote Control

eGovernment Interactivity with the eCitizen

FAA Control

Flight Controls

Interactive Internet Searching

Interactive Online Books

Interactive Shopping Carts

Intercepting Terrorist Communications

Rehabilitation Companion

Robotic Space Arm Control

Telephone and Kiosk Ordering Systems

UAV Voice Control

Video Game Interface

Virtual Reality Voice Recognition Entertainment

Wrist Watch All-in-one PDA, Cell Phone, Video Phone, Music System w/ no buttons

Self-Driving Car Interface

Sources:

1. Microsoft Vista Operating System w/Voice Recognition:

http://www.microsoft.com/enable/products/windowsvista/speech.aspx

2. Dragon Natural Speaking 9.1 Voice Recognition Software:

http://www.nuance.com/naturallyspeaking/

3. IBM VoiceType Dictation 3.0 for Windows 95 and the New IBM ViaVoice:

http://www2.edc.org/NCIP/vr/VR_VoiceType.html

http://www-306.ibm.com/software/pervasive/embedded_viavoice/

4. Other Competing Voice Recognition Software for Word Processing:

http://www.consumersearch.com/www/software/voice-recognition-software/comparison.html

5. Honoring Kurzweil's contributions to Voice Recognition:

http://www.nfb.org/Images/nfb/Publications/bm/bm00/bm0003/bm000311.htm

6. Speech Technology Magazine, articles, white papers, research links:

http://www.speechtechmag.com/

http://www.speechtechmag.com/Archives/Default.aspx?ContextSubtypeID=133

7. Challenges of Voice Recognition:

http://users.ece.gatech.edu/~chl/ngasr03/chair-rabiner.pdf

http://www.msri.org/publications/ln/hosted/nas/2002/rabiner/1/index.html

http://www.nist.gov/speech/test_beds/mr_proj/

http://www.clsp.jhu.edu/seminars/abstracts/F1999/juang.html

http://www.ewh.ieee.org/r10/bangalore/sps/html/spl/2007spl02.htm

http://ling.uta.edu/~laurel/NYTmachine-prose.pdf

8. Voice Recognition for Military:

http://www.usatoday.com/tech/news/techinnovations/2007-04-02-ibm-donation_N.htm

http://www.stormingmedia.us/11/1170/A117034.html

9. Voice Recognition PDA Translation Device:

http://www.cs.cmu.edu/~awb/papers/eurospeech2003/speechalator.pdf

http://www.wired.com/science/discoveries/news/2005/11/69537

10. Challenges Voice Recognition in Court Reporting Open Dialogue:

http://www.robson.org/gary/writing/cr-speechrecognition.html

By Lance Winslow
A Retired Franchisor
Consultant Brain-4-Hire
Internet Writer and Author
Online Think Tank Coordinator
Former Track and Field Athlete "miler"

*This article originally intended for a personal column on the Award Winning The Future of Things Website: http://www.TFOT.info . Perhaps worth a visit, for all the latest technological advances of Humankind.

			Thoughts of the Week
			Thoughts of the Week

The Future of Voice & Speech Recognition

The Future of Voice &
Speech Recognition