Everything happens simultaneously: The physician examines the patient, reports the results in the microphone – at the same time as the speech appears as finished text on a computer screen.
Photo: Rune Petter Ness

From speech to text

Legars handskrift kan vere håplaus å dekode. Men talen deira kan snart bli forstått av ei maskin.

The physician speaks into the microphone. The words appear on the screen as she speaks them. But this is not always as straightforward as it might seem. Perhaps she speaks in a dialect, and says «ol’ doc’ of the patient» rather than «the old doctor of the patient». The computer program searches for a moment for the phonetically reduced words, but they are not found in the pronunciation database. So they must be included.

People have a nasty habit of not completing words and sentences in the course of normal, natural speech, and this doctor is no exception. She starts out with one word, but changes her mind halfway and chooses a different word instead. Thus the program must understand that when she says «reali-» she intended to say «realize» – «reali» is not a proper word. Or when she says «search, er, find » the program must know to ignore search and that «find» is the word she wants.

Postdoctoral fellow Bojana Gaji´c at NTNU is grappling with all these problems. Because a computer knows only what it has been taught, many roadblocks remain before a computer program can learn to handle the idiosyncrasies of human speech. Tasks like these fall to researchers in speech technology.

Time saving

Back to the physician. If she’s a resident at a large, busy hospital, she’ll want to spend as little time as possible updating patient records. Such routine tasks should take a bare minimum of time and any software ought to be as trouble-free as possible.

Gaji´c knows this, and is striving to create a system for speech recognition so that a physician can file oral reports – complete with all the hesitation, repetition, and other «mistakes» that are the nature of spontaneous speech. The spoken word should be immediately appear on the screen in a form that can be checked for errors and then included in electronic record base, instantly available to whoever has access to the database.

This is Gaji´c vision – but it is not yet reality. When the program will move from vision to reality, Gaji´c does not know. What is clear is that the system must be without bugs before it can be used. If physicians are given a system that constantly misunderstands speech, they will stop using it. Even though the hospital overall saves time by making the writer’s tasks superfluous, physicians will have to spend more time making the program work.

The problem of noise

Gaji´c’s work previously focused on the problems of background noise and speech recognition. This is a critical issue with respect to a different speech recognition program — the MOBEL project, or MOBile ELectronic Patient Record. The MOBEL project is comprised of a portable terminal with access to important patient data that physicians might need as they move around the hospital — during rounds, for example. Vital information can be retrieved or entered using speech.

Background noise poses a particular problem for this kind of speech recognition program. When you sit in your office and dictate a report, background noises like the hum of a light or the whirr of a computer fan are seen by the computer as a constant. But moving between different rooms means the background noise will vary. That means the system must be capable of filtering out irrelevant sounds and adjusting to a constantly changing acoustic image.

By Tore Oksholen