Computers and voice recognition technology
by David J Bilinsky
Say you’ll never cover your ears and close your mouth And live in a silent world... Say it say it say it... Tell it like it is...
Writer and vocalist: Tracy Chapman
Ahh! The dawn of a new day! On coming into your office, you turn on your computer and put on a headset with a microphone. That will be the last time you will touch any part of your computer except to leave. You then start issuing instructions “Wake up!” “Launch WordPerfect”. “Click ‘File’”. “Click ‘Open’”. “Click ‘letter form’”. The letter precedent now being on your screen, you start to dictate in a natural voice, without hesitations or breaks between words: “File No. 98-1234.” New line. New line. April 1, 1998. New Line. “Gottum Coming and Going”. New line. “Barristers & Solicitors”. New line. And so on and on and on....When you are finished you say “Click Print”...and the finished letter scrolls out of your printer...
Sounds far fetched? Too much like Robbie the Robot or Star Trek? Thanks to some hard-working computer programmers, voice recognition technology has now come out of the laboratory and into the work place. Chances are you have already used this technology, if you have used a ‘hands-free’ cellular phone or used some voice-mail systems that ask you to speak rather than press keys to select options. The good news is that this technology is cheap. The bad news is that this technology is cheap. By this I mean that most of the costs of converting to this technology are hidden or soft costs. But let’s jump to the specifics:
- The two competing systems are IBM’s Via Voice and DragonDictate’s Naturally Speaking. Both IBM and Dragon have issued earlier voice recognition versions but they required you to hesitate between each word. The latest generation allows ‘natural’ voice recognition.
- Cost you say? Under $300 and dropping. At least for the software.
- Hardware requirements: This software requires horsepower!. While the minimum requirements are a Pentium or equivalent at 133 mhz, I wouldn’t consider anything less than a 233 or better for reasonable performance.
- Operating systems: Forget Dos. These systems require Windows 95 or NT 4.0.
- Memory: 40 megs (48 for NT) RAM minimum but again, for reasonable performance I would get at least 64 and preferably 80.
- CD-ROM drive is a necessity, as the software only comes on a CD-ROM.
- Sound Card: A high quality 16-bit sound card such as a Soundblaster 16 or AWE32 or better. The sound card has a direct impact on the ability of the software to recognise your voice.
- Microphone: These systems come with a headset and microphone. Of course, the better the mike... A suggestion that was made from an attendee during my recent presentation at the CBA’s President’s Forum on the Future of Solicitor’s Practice was to acquire a hand-held mike at Radio Shack rather than use the handset/mike combo. Saves the abuse of the headset mike from being hit by the phone every time you answer a call.
- Hard Drive memory: 64 megs minimum. To store more voice files, of course you need more.
- Word Processors: Each system has its own word processor, but to dictate directly into MS Word you can use either the IBM or the DragonDictate system. To go into WordPerfect directly, only Dragon will currently work. Currently, you must download the patch files for Dragon’s home page for these word processors and install them on your system. Dragon also states that their system will work directly in any window application that has a patch. Stay tuned.
- Other computer commands: Dragon includes on their CD another software product that allows you to directly speak commands to your computer.
- Correction: IBM’s system requires you to use a keyboard to make corrections. Dragon’s system allows you to make corrections by voice.
- Learning: To start, both systems require you to recite long passages to allow the system to create and then update voice files that become tuned to your voice. This requires time and patience especially at the beginning, as you must keep building the voice files.
- Time: Did I mention that these systems require time and patience?
- Carbon-based learning: While the system learns by your using the “correction” and “train” components to fine-tune the system to your voice, you must use your correction circuits to learn the nuances of the software. At the beginning, this can be somewhat frustrating, as you would expect a voice-recognition system to understand what you say. Wrong. It is akin to teaching a child.
- Unlearning: For those who can already type, the hurdle that must be overcome in coming to grips with the software is perhaps, doubly hard. The reason is that it is easier and faster to make corrections by typing than by making the correction thru the “correct that” and “train” modules of the software. No pain, no gain...
- Syntax: The system does not recognise syntax errors. That is where it is good to dictate directly into WordPerfect 8.0 or Word 8 that will at least flag some grammatical and syntax errors. Either way, working with voice recognition turns you into a very good proof-reader, since “..following the signing of the contract, we are in for a thrill” can easily come out as “....following the signing of the contract we are in for the kill”.
- Human issues: The bigger issues with voice recognition go beyond hardware and software. They relate to the issues of integration, of work and paper flows, and management. More than any other recent software development, voice recognition offers the spectacle of vastly changing the way we produce legal products.
- Secretaries: Unless you want major changes, these systems are not ready to replace your secretary. Obviously, they won’t take calls if you are on the phone, they won’t give quick answers to clients and they won’t understand office shorthand such as “do a letter to so and so acknowledging service of her correspondence and demanding copies of Part I of their List of Documents.”
- Strategic advantages: Whether or not the present versions meet your needs, there is no doubt that in the long term, this is a transformational technology. Early adopters stand to make considerable inroads if they figure out how to properly work voice recognition into their systems.
While some of us have been speaking to computers for some time, for the first time computers are now able to respond. The onus is now on each of us to decide whether to remain in a silent world or to open up a whole new dimension and then tell it like it is.
David J Bilinsky is a partner at Lakes Straith & Bilinsky and a principal at Integral Management Inc.
This article originally appeared in the April 1998 issue of BarTalk and is reproduced here with permission of both the author and the Canadian Bar Association, British Columbia Branch. |