Contact Us   Site Map
Airline Mismanagement

Speech recognition software is now on the cusp of being a practical means of inputting data to a computer.

Read this article series to see how to make it a practical solution for you too.

 
 
Travel Planning and Assistance
Road Warrior resources
Noise Reducing Headphones
International Cell Phone Service
GSM cell phone unlocking FAQs
Unlock Your GSM Cell Phone
Portable MP3 Players
GPS series of articles
Should you choose an iPhone or Android series
Apple iPhone review series
iPhone 3G/3GS Battery replacement
Third Rail iPhone 4/4S External Battery
Apple iPad review series
iPad/Tablet Buying Guide
Google Nexus 7 review
Netflix Streaming Video
Sharing Internet Access series
Microsoft OneNote review
T-mobile/Google G1 phone review series
Blackberry review and user tips
Palm Tungsten T3
Motorola V3 Razr cell phone review
Motorola V600 cell phone review
Nokia 3650 cell phone review
SIM Saver GSM Phone Backup and Copy Device
Clipper Gear Micro Light
Amazon's Wand review
Amazon's new (Sep '11) Kindles and Fire review
Review of the Kindle Fire
Amazon Kindle eBook reader review
Amazon Kindle 2 preview
Sony PRS-500 eBook reader review
Audible Digital Talking Books review
Home Security Video Monitoring
Quik Pod review
Joby Gorillapod review
Satellite Radio Service
Satellite Phone Service
All About Speech Recognition Software
2005 Best Travel Technology Awards
How to connect to the Internet when away from home/office
Bluetooth wireless networking
How to Choose a Bluetooth Headset
Logitech Squeezebox Duet
Packet 8 VoIP phone service
Sugarsynch software review
iTwin remote access device
Barracuda Spam Firewall review
Cell Phone Emergency Power Recharger series
First Class Sleeper
Roboform Password Manager review
Securikey USB Computer Protection Key review
Steripen UV Water Purifiers
ScanGaugeII OBDII review
SafeDriver review
Expandable Bags for Traveling Convenience
USB Flash Drive
Vonage VoIP phone service
Laptop Screen Privacy Filter
AViiQ Laptop Stands
Aviator Laptop Computer Stand
No Luggage Worries
Pack-a-Cone roadside safety flashing cone
Emergency Self charging Radio
Evac-U8 Emergency Escape Smoke Hood
MyTag Luggage Tags
Beware of Checked Baggage Xray Machines
SearchAlert TSA approved lock
Boostaroo Portable Amplifier and splitter
Dry Pak protective pouch
Boom Noise Canceling Headset
Ety-Com Noise Canceling Headset
Nectar Blueclip BT headset holders
Skullcandy Link Headset Mixer
Lingo Pacifica 10 language talking translator
Nexcell NiMH rechargeable battery kit
Jet Lag Causes and Cures
SuddenStop License Frame
CoolIT USB Beverage cooler
Travel ID and Document Pouches
Protect Yourself Against Document Loss
Personal Radio Service
PicoPad Wallet Notes
Times Electronic Crossword Puzzles
Slim Cam 300 micro digital camera review
Stopping Spam
BottleWise Bottle Carrier review
The End of the Internet as We Know it?
How to Book and Buy Travel
Scary, Silly and Stupid Security Stories
Airline Reviews
Airline (Mis)!Management
Miscellaneous Features
Reference Materials
About the Travel Insider
 
Search
Looking for something else? Search over two million words of free information on our site.
Custom Search
 
Free Newsletter

In addition to our feature articles, we offer you a free weekly newsletter with a mix of news and opinions on travel related topics.

 

 View Sample
Privacy Policy

 
Help this Site
Thank you for your interest in helping this site to continue to develop. Some of the information we give you here can save you thousands of dollars the next time you're arranging travel, or will substantially help the quality of your travel experiences in other, non-cash ways. Click for more information
 
Reader's Replies

If you'd like to add your own commentary, send me a note.

 

All About Speech Recognition Software

Talk, not type, to your computer
 

The omnipresent eye of the HAL 9000 computer in 2001 A Space Odyssey introduced the world to modern speech recognition in 1968. What was science fiction then is close to science fact now.

This is the first part of a series on speech recognition software.  See related articles listed on the right.

 

 

Reliable speech recognition is something that has been long sought after, but only recently is becoming practical on normal computers.

The extraordinary computing power of a modern home computer, and the evolving capabilities of speech recognition software now offer the promise, and possibly the reality, of being able to effortlessly control and communicate with and via one's computer merely by talking normally to it.

Read through this and the rest of our five part series to understand what speech recognition is now capable of, if it might be suitable for you and your needs, and how to best use it in your own work environment.

A Short History of Speech Recognition

Speech versus voice recognition

First, perhaps we need to define some terms.  We are using the term 'speech recognition' to refer to a computer being able to listen to an ordinary speaking voice and understand the words and sentences being spoken.

Voice recognition is something different.  We consider voice recognition to be the ability to hear someone speaking and identify the person whose voice is being heard.  This process is completely different, and the process of voice recognition may not actually involve understanding any of the words, but might be just limited to recognizing the voice.

This article is all about speech recognition, not voice recognition.

Slightly more than 40 years of history

Speech recognition technology has often been incorporated into science fiction, but for a long time it seemed as fanciful and impossible as death rays and faster than light interstellar travel.

Death rays are now a reality.  Faster than light travel - at least at the subatomic level - is becoming a possibility, and after 40 years of hard slog, so to is speech recognition.

Of course the greatest enabler of modern speech recognition capabilities is the ever increasing computing power of a modern computer.  But even limitless computing power would be useless without the appropriate programming to drive a speech recognition capability.

AT&T's Bell Labs developed the first-ever speech recognition device way back in the late 1940s and early 1950s.  But this was more a proof of concept rather than a practical device that could be deployed in the real world.  Until the late 1960s, the focus was on developing systems that would recognize 'discrete' words - that is, words spoken separately and distinctly.  (A fascinating and detailed history can be found here.)

While such systems might have some limited application in some specialized fields, modern 'continuous' speech recognition capabilities first started to be developed in the early 1970s, when research into the theoretical concepts that allow for speech recognition, developed at Princeton University, was taken up by several ARPA (Advanced Research Projects Agency -- the same agency that brought us the Internet) contractors.

Some of the underlying theory

In case you wondered, the underlying theory involves using a technique known as 'Hidden Markov Modeling'.  This is a way of identifying something without actually seeing the thing itself, by determining what it probably might be, based on other things associated with it.  For example, if you wondered what the temperature was outside, and if you saw a person walking down the street wearing only a T-shirt and shorts, you might reasonably infer that it was warm.

The magic of this with speech recognition is that it enables a computer to imprecisely identify words, and then to 'fill in the gaps' based on the words around each other word, more or less the same way we do when we are listening to someone speak ourselves.  The context of a word gives clues as to which the word is - particularly with words that sound the same (for example, consider the phrase 'He gave two balls to the other boy too' - with three different words to/too/two all sounding the same but, based on context, being clearly different).

This leads to the second 'magic' part of modern speech recognition.  Statistically speaking, computers can accurately predict the next word in a phrase based on the words before it.  Indeed, as an immediate and trivial example if you think about the sentence immediately before this one, if the last word was missing, you could probably guess that the last word would be 'it'.  Studies have shown that computer statistical models are more accurate at competing phrases that we as people are when we intuitively do the same thing.

Early products released to the public in the mid 1990s

The various techniques for speech recognition were massively refined during the 1980s.  After various experimental and high end products had been released to limited markets, 1995 saw the release of the first public speech recognition software.  This software, released by Dragon, was a "discrete word" product that required the speaker to clearly enunciate each individual word separately.

Two years later, in 1997, modern speech recognition software appeared as we know it today.  This new Dragon product, called "NaturallySpeaking", allowed exactly as its name implies.  No longer does a speaker need to sound each word individually.  Instead, they could speak in a normal conversational voice, and the computer would be able to break a steady flow of sound into individual words, even if there was no perceptible pause or break between the end of one word and the start of the next.

Since that time, the various different companies offering speech recognition software have all merged, and there is one major company remaining -- Nuance Software, which sells its product under the Dragon NaturallySpeaking name.

The product, now at version 10.1, has continued to improve over the years, and to make better use of the evermore powerful computers available.  One could pointlessly debate whether or not earlier versions of their software were truly ready for prime time or not; the key issue which this article series attempts to address is whether the current version is now something that you should consider for yourself.

The Difference between Discrete and Continuous Speech

Think about how you or anyone else normally talks.  You run your words together, with almost no pause between the end of one word and the start of the next, indeed, sometimes, people will use the end of one word to modify the start of the next word, either deliberately as a type of slang, or unconsciously because it makes for easier speech.

For example, the phrase 'It's a big one' might be pronounced 'It sa bigwun'.  The first two words have been broken at a point so that part of the first word spills to the second word, making both words sound different, and the second two words will be pronounced as if they are a single word.  Or maybe the first two words will be run together the same way as the second two words, as 'itsa'.

A discrete speech recognition system would require each word to be carefully sounded out separately.  This is not the way we talk, and so makes discrete speech recognition systems less convenient.

A continuous speech recognition system will happily understand what you say, and to prove my point I will pronounce that short phrase four different ways, first sounding each word separately, secondly is to run the words together in a single utterance without pause, thirdly as three words with the first two words broken in the wrong place, and fourth by breaking the phrase into two two-word groups.  Let's see how Dragon understands me.  You can also see the CPU loading on the computer while Dragon is hard at work.

How to best watch the sample video

If you have a reasonably fast Internet connection, I would recommend that after you click on the play button, you then increase the resolution of the video from its default 360 setting, and possibly keep on going up past 480, perhaps all the way to either 720 or 1080. You should then increase the video size so it fills your screen, and that way with the larger video image and the higher resolution you can clearly see the text appearing on the video of my screen as I speak.

The option to change the resolution appears on the bottom line, but only after you have started playing the video. If you want the video to go fullscreen, you should click on the button next to the video resolution option button that has the four arrows pointing out to the corners.

Alternatively, click on this link to open up a regular YouTube page in a separate browser window.

Note - the second video in this two part video will be available next week.
 

Technical notes about this video

This test was done on a Dell E6400, with an Intel Core 2 Duo T9600 CPU at 2.8GHz and with 4GB of DDR2 memory, running Win7 32 bit with a Logitech ClearChat Pro USB headset.

NOTE : The sound you hear is NOT from the Logitech headset, it is recorded from the microphone on the camcorder.  The sound that Dragon would hear from the Logitech headset would be very much better, and with less background noise.

Summary of Part 1 of this Article Series

Modern speech recognition systems are designed to work best when you speak normally, and in a continuous flow.  the software, which has evolved over the last 40 or so years, is still not perfect, but it is getting impressively close.

Please read on to the second part of our series, where we talk about whether your type of work is well suited for speech recognition or not.

(And, of course, there's lots more good stuff in the subsequent parts of the series too.)
 

If so, please donate to keep the website free and fund the addition of more articles like this. Any help is most appreciated - simply click below to securely send a contribution through a credit card and Paypal.

 

Originally published 7 May 2010, last update 08 Jul 2017

You may freely reproduce or distribute this article for noncommercial purposes as long as you give credit to me as original writer.

 
 
Related Articles
An Introduction to Speech Recognition
Is Speech Recognition Suitable for You
Accuracy and the Importance of using the best hardware

Coming next week
Choosing a Microphone
Dragon NaturallySpeaking
 
 
 

 


Your Feedback

How Would You Rate this Article

Poor
Average
Good

Was the Article Length and Coverage

Too short/simplistic
About right 
Too long/complex

Would You Like More Articles on this Subject

No
Maybe
Yes

Back to Top