Finding Pleasant Human Voices for Computer Apps

So the now the big rush is to update and upgrade the human voice for the connected car and those imbedded in communications devices like the iPhone’s Siri.

Honda, Audi, Tesla, Cadillac, Ford, and Toyota all want a voice to talk to you in their car. They want a voice you won’t turn off because it’s irritating. Many companies misfired the first time out choosing robotic sounding voices. They soon found out that consumers were turning off their technology because the voice interface was irritating.  Now, thousands of files later, they’ve got to find a new voice and re-do their filesets.  We’ve had good luck finding voices that were perfect. The voice we found for Cadillac sounded just like you’d think a Cadillac would sound…pleasant, authoritative, classy, elegant.

If you could pick the perfect voice for your connected car to speak to the driver, what would be your parameters? “Pleasant” would probably top the list. But many more criteria are needed for the perfect voice to meet our criteria.  After all, we’re not just recording one radio spot.  We’re hiring a voice talent who can record for us for 19+ years.  That’s the sustainability issue in casting voice talent.

As we wrote in an article for Telematics Update magazine, the voice actor needs perfect diction, be free of objectionable dialect, have a liveliness while still sounding professional, and have a concern in the voice that they actually care about you….a far different set of criteria from the robotic voices most engineers chose initially.

At the beginning of voice recordings for automated systems many of these characteristics were ignored in favor of someone who sounded robotic. The goal was to be more machine-like than human. But also, because memory space was limited in early automated systems, single words were recorded which would be combined by the computer to make a phrase or sentence. So, engineers thought it was a good thing to have all the words read in a flat, cold, monotonal voice. That way the sentences could be fabricated easily.  For an idea of what we’re talking about click on “BADLY DONE VOICE FILES” on our site and hear this kind of style of voice recording.

Badly w Arrow

But soon, as end users would just turn off these robotic voices because they were annoying and cold.  Engineers then favored voice recordings with more “humanity” in the voices and expanded memory allocation allowed phrases to be recorded as a complete file instead of just words.  This improved the “humanity” in the voice files. The emphasis then swung to recording phrases so that they’d fit together seamlessly and seem like they were recorded as a single sentence.

The voice filesets we’ve been recording for Alpine’s navigation systems since 1999 have this quality. CLICK HERE to hear the “Seamless” style of recording voice phrases.

Seamless Voice Files

Voice files that combine together seamlessly is the key to better human-emulating systems. And so, in casting your talent, you need a voice talent who can match the tone, pacing, energy, modulation, warmth and volume in files they recorded years ago. An analogy is that the voice talent needs to have the same ability as figure skaters doing their compulsory figure eights in which their blades must follow the same groove they did on their previous circuit.

And so when we do our voice casting (we find voice talent as well as record them) we listen for not only a pleasant voice but a well-controlled voice.   Often, we find singers have this athleticism in their voice. Voice over talent who just record thirty-second spots don’t often have the ability to repeat exactly the variances they had in their voice three years ago.  And they tend to overmodulate (adding that infomercial, in-your-face, marketing, sing-song delivery). The ideal voice talent who has the ability to record files that match what they did three years ago. An analogy would be a brick maker – every brick must match in color, weight, size, consistency, texture, shape, edges, etc. etc. We’re not interested in voice talent who can do a spectacular read one time but can’t repeat it four years later. We’re building a vast “brick wall” of voice files and they all need to fit perfectly together.   And we want voice talent who can “put a smile” in their recordings.  You can actually hear that smile.

Luckily we’ve found voice talent who meet these criteria. They’ve been building filesets with them for Alpine for over sixteen years now. If you play a voice file recorded in 1999 it will match seamlessly with one recorded in 2011. This kind of quality results from choosing the best voice talent at the very beginning.  We hope you fare well in your choice of voice talent so you only have to do the fileset once, and not start all over again like Apple did with Siri.

(Fletcher Murray’s voice recording team has voicecast and recorded hundreds of thousands voicefiles for filesets for IBM, Alpine, Johnson Controls, Visteon, deCarta, Honda, Acura, Cadillac, Clarion and Microsoft. The Association’s quality control has achieved Six Sigma levels in error-free performance.)

We’re so good, we can’t talk about it….

Funny thing.  We work for great clients.  They love our work (finding and recording perfect voices for their computerized applications).  So what’s the problem?

We ask them for success stories but they tell us they’re bound by their corporate anonymity policies.

So after much begging they agreed to review our work on the condition of anonymity.)  Below are the real quotes (can you pick the one I wrote?)   The rest are real.

“High quality voices in many languages, excellent translation services, direct translation review to ensure accuracy, friendly staff. ” (Senior Project Manager)

“Professional and cost effective recording of voice prompts for use in vehicle navigation applications.” (President of Company)

“A full service voice recording company.  identifying candidates for recorded voice applications according to customer requirements, signing voice talent to contracts for the term required by the customer, performing requirement engineering on the scripts to minimize unnecessary changes, making arrangements for studio time, arranging and managing recording sessions, and performing QA on the voice files. The Association also records catch up recordings as changes are needed.” (Senior Project Manager).

“Procurement and retention of voice talent and coaches in 3 languages Project management, planning, scheduling…Ability to provide accurate schedule and cost estimates …Transparent and documented quality control process from start to finish.” (Senior Project Manager)

“The Association is a very well organized company that treats each project with the upmost importance and your precision to detail. During script reviews I like that the Association brings in expert language coaches to cypher through hundreds and thousands of lines to find any grammatical errors.

During the recording sessions I appreciate all of the work that your team does (from listening to “clicks” to editing recorded files on the spot). The quality control that goes into each recorded voice file. The “Legal Buffer” that you provide to protect us from any legal issues that may arise.”  (Project Lead)

“Fletch is the hardest working white guy I know. To see the way he’s organized the chaotic business of voice casting and recording into a process that achieves Six Sigma levels of error free voice files is a marvel for anyone interested in not wanting to worry if they’re going to make their deadlines and stay on budget.  I’d hire him in a minute and relax.”  (Tall Person)

(We’ve worked for Alpine, IBM, deCarta, Microsoft, Clarion, Johnson Controls and others since the beginning of car navigation systems over fifteen years ago.  We’re approaching 60,000 error free voice file on sustainable filesets for returning customers.  Always willing to add peace and sanity to this process for you.  Please call my cell at 818 606-3538.)

Finding the Perfect Voice

Searching for the “perfect” voices for HMI (Human Machine Interface) in smart devices, biometrics, natural language, telematics and navigation systems is challenging.  Not only does the voice talent need to sound good today, but be able to match their performance for years into the future as technology develops. But if you choose the right voice it can save you millions of dollars and avoid re-recording a whole new fileset if the voice proves unsustainable in the short term.

Most all companies face this process…it’s almost like computer dating. You want to find the “perfect” one. The one that will be yours forever.

Well we can’t promise you we’ll find you voices that will be perfect forever but we’ve found voices that have been perfect for our customers since 1999. Contrast that with Apple’s quick replacement of Siri’s voice.  Audi, General Motors, Elektrobit Automotive, Ford, Chrysler and other manufacturers interest in the HMI applications have to realize that the voice chosen IS THE DEVICE to the consumer.  Consumers either like the voice or they don’t.  The thinking behind the original Siri voice is baffling to us.  It reminds us of the mistakes engineers were making at the beginning of voice prompts for navigation system.  They wanted the voices to sound like computers for some unknown reason.   Do you want a cold, impersonal, heartless voice telling you what to do?

Most users of car navigation systems admit they turn their systems off.  What manufacturers don’t realize is the the voice in a device IS their company to the consumer.   Shouldn’t the voice of Cadillac sound like what you imagine a Cadillac would sound like if it had a voice?

Well, that’s what we do.  if Nuance had asked us to find a voice it would have been the coolest voice you can imagine.  But they didn’t.  So now we can offer that coolest voice to other clients.

 

15 Criteria to picking the “perfect” voice

I think if Apple and Nuance had a “Voice Talent Evaluation Grid” they would not have ended up with a voice that they would replace in three years…at great expense.  So how do you pick a “perfect” voice?

It helps to have a template when auditioning voice talent.  We have a checklist template that has fifteen criteria to help us pick the best overall voice talent.

Choosing the voice talent to record for new VUI systems usually involves five people or more. Each “likes” different voices for different reasons.  But pleasantness of the voice is just the start. There’s fourteen more criteria to be measured on our checklist.  For example, you have to listen for clicks, pops, saliva noises which will all have to be removed or rerecorded.  If the voice talent has bad comprehension and makes an error every third line, your budget has just grown 33%.  When you’re recording 109,000 words in a script you’ve got to have a clean voice talent.

A checklist also helps the whole team stay on the same page. The last thing you want is for the client to dig in their heels on a voice that has so many technical flaws you’ll bust your budget cleaning up the files.   The checklist helps the client understand why you favor candidate 5 over candidate 4.

BTW, we don’t use talent names because it prejudices the choice.  After all, wouldn’t it affect your choice if you knew candidate 3 and 4′s real names –  Angelica and Gurdta?

So that’s a few examples of the fifteen criteria we apply in evaluating voice talent. Take a look at the Voice Talent Evaluation Grid Checklist. Click here to go tour our site to download our Voice Talent Evaluation Grid Template.

The checklist helps you listen for technical flaws that will inflate the cost of the project as well as imbed in the files errors that must be fixed downstream and even more expense. The checklist also assures that our client, their client, the sound engineer, and the producer deliver voice files at a great price that will endure for a long, long time.  One voicefile contract has been running seventeen years now with thousands of files recorded.

CUSTOM VOICE CASTING - Most recording studios have a fixed stable of regulars they record often.  We do a custom voicecasting for each client.  Our last project we cast forty-three women.  Then we culled it down to the best nineteen.  The talent came in and read a carefully prepared script meant to reveal difficulties that could imperil the budget and the timeframe.  For example, most voice talent are used to recording just a few pages of copy.  We record 11,000 lines.  If the voice talent doesn’t have endurance and consistency it doesn’t matter how good they sound.  The script puts them to the test.

 

How to find the best voice for your VUI (voice user interface) application

Finding a pleasant-sounding voice is key to a successful VUI interface. But it’s not always easy to get it right the first time, as proven by Apple’s move away from Siri’s first voice to a broader spectrum of voices people may like better. Many other auto manufacturers are facing the same task of replacing voices that fail to connect with the customer. Some are so aggravating that drivers turn off their system.  This reflects negatively on the manufacturer.

We think the voice should BE the voice you’d expect to hear if your car or device could talk.

An example is the voice we placed in Cadillac. Listen to her recording our world-famous    Seamless Voice Files  - six files combined to sound as smooth as if she’d recorded one sentence.

Here's how a beautiful voice looks...

(above – A beautiful voice we cast for Cadillac looked like this, but listen to how her seamless voice files sound.

How can you improve the odds of getting a voice right the first time?

In a current project for a new client, we cast 43 females from our cadre of professional Hollywood voices to find the “perfect” one. We narrowed the selections to twelve to present to the client. To help sensitize the client to the variables to listen for, we provided them with a Voice Evaluation Grid template. This helps the client listen for the qualities in the voice that will be a sustainable solution for them. Too often clients just don’t know what to listen for.

If you’d like a Voice Evaluation Grid, please email me at fletch@theassociation.tv

My next installment will go into more specifics of choosing and recording a voice for zero defect voice file sets.

Learn Digital Filmmaking in two days at the Palm Springs Photo Festival: Quickly and Easily

Learn digital filmmaking in two days in beautiful Palm Springs. Bring your Canon DSLR and get ready for a fun learning experience in aesthetic, relaxing Palm Springs, California.

Setting up the Jib Arm shot    584ea854e5e1fbd97df770eb30d6ea98_w480

Fletch Murray’s world-renowned CineBootCamps guide you through hands-on drills to quickly familiarize you with the powerful Canon DSLR’s video-making capabilities.  Fletch and his team have trained over 400 students worldwide.  99% of the students rate the bootcamp as “More than I expected” or “Couldn’t ask for better”.

Stats from student surveys

This is why the Palm Springs Photo Festival brought Fletch in as the video instructor. Fletch designed unique, hands-on workshops to escort still photographers over to the video side of the Canon DSLRs. (Fletch presented workshops for Brooks Institute and private training packages for the John Deere Corporation, Boeing and others.)

One film student commented that he learned more in two days at the CineBootCamps than he did in two months of film school.   Fletch moves through all the menu setups and key procedures to make your films look as cinematic as the episode of “House, M.D.” that Gayle Tattersall shot with the Canon 5D Mark II and 7D.

 

Fletch shows student the DVtec rig

The key difference between Fletch’s CineBootCamps and other film training is that your questions come first and the curriculum is tailored to your needs and interests.  Fletch finds what’s holding back your filmmaking and frees your creativity that’s been stumbled by manuals that are hard-to-read, out-of-date or written by ivory tower bookworms.  And though Fletch has filmed in over 20 countries and won two Emmy awards he rarely talks about himself or his work unless it applies to a student’s question.

 Gloria Baker

The CineBootCamps are about producing films. Not “talking about” production but producing something.  So, you have to shoot a short film to make sure you duplicate and can apply the key skills needed to make a professional-looking video.

Here is a behind the scenes link to the filming at last years 2013 Palm Springs Photo Festival.

The Palm Springs Photo Festival offers a BASIC VIDEO WORKSHOP and an ADVANCED VIDEO WORKSHOP.

 

- See more at: http://blog.theassociation.tv/#sthash.pO8ZJd6I.dpuf

FROZEN – Solving Seven Common Problems Still Photographers have Shooting video

It’s hard for a still photographer to step into the world of motion pictures. Aside from the new menu settings (hundreds of choices), there are new creative and mechanical aspects.

For three years, we’ve been training still photographers at the Palm Springs Photo Festival, Brooks, the California Photo Festival, and the CineBootCamps to shoot great video with their Canon DSLRs.

The differences between the still world and motion picture world are much more than still photographers yelling “Hold it”, and filmmakers yell, “Action”.

Here are the top seven:

1) Resolution - For a still photographer, telling them they’ll now be shooting at 2 megapixel resolution is like going back the Model T Ford.  But it’s true.  Video shoots at a disappointing 2k (for a 1920 by 1080 sized picture. That’s all TV does at the moment.) Of course, there are exceptions. You can achieve higher resolutions with the Canon 1D X or if you install the Magic Lantern software add-on.

  Jib shot from on high

2) Creating a Flood of images - The motion picture audience will not be studying a single photograph for five minutes at a time.  (If they did they’d see how bad video’s resolution is.)  Luckily, the single frames fly past the viewer’s eye 24 frames per second (or higher) and their attention is hurried along from measuring resolution to following the story the pictures are telling. If the story’s great, the audience won’t care what the resolution is.  (We have hands-on drills to expand the story from a single image to a flow of three-act image scenarios.)

3) Camera Movement - Unfreeze your camera.  The upside here is that by moving the camera (on a dolly, or a slider, or a jib) one can bring the third dimension (depth) to the image. This brings the audience into the picture, establishing three-dimensional space. Something stills can’t do (unless you’re shooting 3D).  Often our still photographers will video our model from just one spot, as if they are still shooting stills.

(We invite them to unfreeze and move to different angles).

 

4) Actor/Model Movement - You get to “unfreeze” your model in video….expand “the Pose” into a flow.  Still models tend to freeze in a pose.  Now they have to flow.  You have to determine where a sequence begins and ends. Then, you have to build sequences to blend together to move a story along. (We also have drills to help you unfreeze your creative mind.)

5) Time - In still photography you’re freezing time. In motion pictures you’re capturing a flow of still moments to be reborn later when played back. You could say you’re capturing the 4th dimension – time. And again, you have to determine what happens over this timespan…in each scene as well as the whole movie.  Before you just wanted an instant. Now you’re creating a communication that captures days. (We have drills to expand from a moment to a flow that captures the passage of time.)

6) Light - Still photographers set the light for one angle, one frame, one millisecond.  Canon HDSLR motion picture photographers set the light according to the opening position camera will be in. Then, set the light for the ending position the camera will be in. Then, check that the middle of the camera move doesn’t get any flare from light sources (which is one of the reasons you see those matte boxes on the camera lens).  And again, these are different lights…continual illumination, not flash.

7) Group Create - The camera operator moves from the still world (where they and the model are in a one on one situation), to a world of creative filming as a group, which can add a dolly grip (someone to push the dolly), a focus puller, a gaffer to handle the lights, a person capturing audio, and a clapboard person.  It’s like moving from playing golf to playing basketball.  Of course for smaller video shoots the camera person may have to wear most of those hats.

There are many more nuances in the transition from stills to video.  At the Palm Springs Photo Festival we offer two workshops – Basic Video and Advanced Video techniques.   It’s fun to watch still photographers melt from the frozen world of stills to the swarthy, lo-res, 24 frames per second fluid world of motion pictures.

(Click Here to register for the Palm Springs Photo Festival, April 28 0 May 2.

Click Here for more info about Fletch’s CineBootCamps.)

 

 

- See more at: http://blog.theassociation.tv/#sthash.pO8ZJd6I.dpuf

DP Polly Morgan Joins the Prague Boot Camp Team!

Polly Morgan, Director of Photography Extrodinaire

The ASC’s Rising Star, Polly Morgan, will be on the Canon Boot Camp team during our March stint in Prague. Polly’s years of HDSLR experience are a welcome addition! She has worked many, many productions across the globe, and has been mentored and taught by some of the best pros in the industry.

Polly’s career started with Ridley Scott Productions in London, working on commercial productions and feature films with cinematographers such as Haris Zambarloukos, BSC; Caleb Deshanel, ASC; Bojan Bazell, ASC and Dan Mindel. She also trained at the American Film Institute Conservatory, receiving invaluable training from some of the world’s best:

Roger Deakins, ASC/BSC (THE SHAWSHANK REDEMPTION, NO COUNTRY FOR OLD MEN)
Tom Stern, ASC (FLAGS OF OUR FATHERS, CHANGELING)
Harris Savides, (AMERICAN GANGSTER, THE GAME, ZODIAC)Vincent Leforet's Mobius with Polly Morgan

More recently, Polly worked with Wally Pfister, ASC on Inception as well as other well known projects. Feature films she has contributed to include V for Vendetta, Hairspray and National Treasure. 

But what really interests us from the Canon perspective, is the work she has done using the Canon C300, such as the narrative short “Mobius” directed by Vincent Leforet, and numerous DSLR commercial productions shot with the Canon DSLR 5D, such as “Sonova Surfboards” and “Pepsi Lost Dog.” Then there’s the music videos. So it’s with great enthusiasm that we receive her in Prague for the boot camp!

Those interested in joining us can get an application online but best to hurry, as there is a limited number of students allowed for each class. We’ll be delivering 2 Pro Level I Classes (the basics), each followed the next day by Pro Level II (hands on filming). For more information about the Prague Boot Camp, click HERE.

 

Stills to Motion: Secrets of DSLR filmmaking

Palm Springs Photo Festival

GRADUATES COMMENT on our workshops

Charles Kay, Pro Photographer
(Click HERE to see Charles’ site)

(Click HERE to see Serena’s site)                                           

Gloria Baker 

(Click HERE to see Gloria’s site)

Ray Carns

(Click HERE to see Ray’s site)

 

CLICK HERE TO REGISTER FOR THE PALM SPRINGS PHOTO FESTIVAL

Professional still photographers and filmmakers gather at the Palm Springs Photo Festival each year to enjoy the refreshing aesthetic recharge in balmy Palm Springs at the elegant Korakia Penzione.  We teach beginner and pro video workshops and simply must say that Jeff Dunas and his team organize top drawer workshops, seminars, portfolio reviews and see the work of world-class leaders in the field of aesthetic imagery.    I think I learn as much as my students each year.

- See more at: http://blog.theassociation.tv/#sthash.pO8ZJd6I.dpuf

Secret to a Clean Slate

After all these years we've got a clean slate.

In filmmaking they hold this slate (clapboard) up in front of the camera to identify the scene. They "clap" the board of the clapboard after identifying the scene.

For years we had dirty slates.  They never seemed to clean off the last scene's information. It was smudgy.