Finding Pleasant Human Voices for Computer Apps

So the now the big rush is to update and upgrade the human voice for the connected car and those imbedded in communications devices like the iPhone’s Siri.

Honda, Audi, Tesla, Cadillac, Ford, and Toyota all want a voice to talk to you in their car. They want a voice you won’t turn off because it’s irritating. Many companies misfired the first time out choosing robotic sounding voices. They soon found out that consumers were turning off their technology because the voice interface was irritating. Now, thousands of files later, they’ve got to find a new voice and re-do their filesets. We’ve had good luck finding voices that were perfect. The voice we found for Cadillac sounded just like you’d think a Cadillac would sound…pleasant, authoritative, classy, elegant.

If you could pick the perfect voice for your connected car to speak to the driver, what would be your parameters? “Pleasant” would probably top the list. But many more criteria are needed for the perfect voice to meet our criteria. After all, we’re not just recording one radio spot. We’re hiring a voice talent who can record for us for 19+ years. That’s the sustainability issue in casting voice talent.

As we wrote in an article for Telematics Update magazine, the voice actor needs perfect diction, be free of objectionable dialect, have a liveliness while still sounding professional, and have a concern in the voice that they actually care about you….a far different set of criteria from the robotic voices most engineers chose initially.

At the beginning of voice recordings for automated systems many of these characteristics were ignored in favor of someone who sounded robotic. The goal was to be more machine-like than human. But also, because memory space was limited in early automated systems, single words were recorded which would be combined by the computer to make a phrase or sentence. So, engineers thought it was a good thing to have all the words read in a flat, cold, monotonal voice. That way the sentences could be fabricated easily. For an idea of what we’re talking about click on “BADLY DONE VOICE FILES” on our site and hear this kind of style of voice recording.

But soon, as end users would just turn off these robotic voices because they were annoying and cold. Engineers then favored voice recordings with more “humanity” in the voices and expanded memory allocation allowed phrases to be recorded as a complete file instead of just words. This improved the “humanity” in the voice files. The emphasis then swung to recording phrases so that they’d fit together seamlessly and seem like they were recorded as a single sentence.

The voice filesets we’ve been recording for Alpine’s navigation systems since 1999 have this quality. CLICK HERE to hear the “Seamless” style of recording voice phrases.

Voice files that combine together seamlessly is the key to better human-emulating systems. And so, in casting your talent, you need a voice talent who can match the tone, pacing, energy, modulation, warmth and volume in files they recorded years ago. An analogy is that the voice talent needs to have the same ability as figure skaters doing their compulsory figure eights in which their blades must follow the same groove they did on their previous circuit.

And so when we do our voice casting (we find voice talent as well as record them) we listen for not only a pleasant voice but a well-controlled voice. Often, we find singers have this athleticism in their voice. Voice over talent who just record thirty-second spots don’t often have the ability to repeat exactly the variances they had in their voice three years ago. And they tend to overmodulate (adding that infomercial, in-your-face, marketing, sing-song delivery). The ideal voice talent who has the ability to record files that match what they did three years ago. An analogy would be a brick maker – every brick must match in color, weight, size, consistency, texture, shape, edges, etc. etc. We’re not interested in voice talent who can do a spectacular read one time but can’t repeat it four years later. We’re building a vast “brick wall” of voice files and they all need to fit perfectly together. And we want voice talent who can “put a smile” in their recordings. You can actually hear that smile.

Luckily we’ve found voice talent who meet these criteria. They’ve been building filesets with them for Alpine for over sixteen years now. If you play a voice file recorded in 1999 it will match seamlessly with one recorded in 2011. This kind of quality results from choosing the best voice talent at the very beginning. We hope you fare well in your choice of voice talent so you only have to do the fileset once, and not start all over again like Apple did with Siri.

(Fletcher Murray’s voice recording team has voicecast and recorded hundreds of thousands voicefiles for filesets for IBM, Alpine, Johnson Controls, Visteon, deCarta, Honda, Acura, Cadillac, Clarion and Microsoft. The Association’s quality control has achieved Six Sigma levels in error-free performance.)

The Association Blog

Finding Pleasant Human Voices for Computer Apps

Leave a Reply Cancel reply