Captioners: Remembering Your Audience

Back in the days before closed captioning was mandated, translated, and legislated, everything was clear and simple: captions were created for deaf and hard-of-hearing (HoH) people. Looking back through the rosy glow of nostalgia, captioners had a goal and worked with like-minded broadcasters and agencies to serve our target audience.

In reality, that model has been changing since before the Television Decoder Circuitry Act was enacted in 1990. Even then, less than half of the country’s 500,000 caption decoders had been sold to people with hearing impairments. Today, the average American is most likely to think of captioning as something one sees in noisy bars, gyms, and airports, but the people who need captioners are the same ones captioning was created for: the deaf and HoH audience.

I wrote an article about writing this article. Specifically, it’s about using Facebook as a journalistic tool for doing interviews — especially when interviewing deaf people.

High-Quality Captioning: A Conundrum

At first blush, a captioner’s goal seems simple: produce high-quality captioning. Unfortunately, that goal has two major problems. First, defining “quality.” Second, answering the critical question, “quality for whom?”

I primarily use captioning to help me keep up with the dialog on TV while the dog is barking, the grandson is talking, the phone is ringing, and the world cat-wrestling championship is taking place on the couch next to me. When I miss a word, I look at the bottom of the screen and there it is! Deaf people aren’t using captioning to fill in a few gaps. They’re using it as a substitute for the audio track. “Quality” to them isn’t the same as it is to me or to you.

When NCRA issues a CBC (Certified Broadcast Captioner) or CRR (Certified Realtime Reporter) certification, they test what’s practical to test: your ability to produce a verbatim – or near-verbatim – voice-to-text product. Getting those words transcribed and onto the screen isn’t the whole job of a captioner, though. Other facets of the captioning matter, too.

The people I interviewed for this article raised some issues that you may not normally consider part of delivering quality captioning, including:

  • Latency: The delay between the word being spoken and appearing in the captions
  • Positioning: Captions covering critical content on the screen
  • Lack of Speaker ID: Not making it clear who is speaking
  • Non-verbal Cues: Sound effects, song identification, and other non-spoken information


Dana Mulvany of Differing Abilities told me, “Significantly delayed captions end up denying access, particularly when they are cut off by commercials. They also deny access to understanding the facial expressions as well as the words.”

Delays are definitely a big issue. If the captions lag two or three seconds behind the video, it’s pretty easy to follow along and see the broadcast as a unified whole. I timed a national morning news show several times over several broadcasts, and found delays of seven to nine seconds. When watching a fast-paced newscast, it becomes difficult to understand when the captions are that far behind the video. On talk shows, I’ve measured delays of 20 seconds and more. At that point, you’re several jokes behind, and you can lose content as well.

“Delayed captions can get cut off when the program is interrupted by commercials or the end of the program, so they can be highly disruptive,” Mulvaney said.

Digital satellite broadcasts delay the video by several seconds, and DTV transcoding of captions can introduce even more delay.

Philip Bravin, former Chair of the Gallaudet University Board of Trustees, commented, “Sometimes I go back to standard definition just to enjoy captions on news better, because the latency in HDTV captioning is driving me crazy.”

One way that captioners can reduce the delay in the captions is to listen to a direct audio source over the phone instead of pulling audio from a digital broadcast. Additionally, most captioning software allows you to adjust the delay time. Clearly, if the software holds back a line or more of captions, you have more time to correct errors, which makes the caption text more accurate. This, unfortunately, comes at the expense of usability, as the delay makes captions harder to follow.

There’s more to the latency story than that, however, and most of it is out of the captioner’s control. As an example, my wife (freelance stenocaptioner Kathy Robson) was doing a sports broadcast the other day. The client required the captions to be routed to several encoders. This meant that she had to dial in to the captioning firm, which split the signal and routed it to multiple destinations. I stopwatched the delay. From the time the captions left her computer until they appeared on the satellite image we were watching was just under eight seconds. You can do your part, but you can’t fix that problem.


I’m not speaking here of purely aesthetic decisions about where to place captions, but of practical positioning decisions that affect the usability and understandability of the captioning. Typically, this means captions covering critical content on the screen.

“[It] drives me nuts when they are captioning something that is written on the screen, like David Letterman’s Top 10 List,” said Tom Willard, Editor of Deafweekly. “Why don’t the captionists look up at the screen and stop captioning when the info is right there on the screen?”

Willard is speaking of a situation where the captions needlessly duplicate what’s on the screen – and sometimes introduce errors in doing so. Back in the old days, a captioner could simply stop writing when the Top 10 List appeared. Today, the caption text is often aggregated to produce searchable video. This means captioners can’t simply stop writing.

“Data mining is just a byproduct, I would think, but the reason there are captions is guys like me,” said Bill Graham, founder of the Association of Late-Deafened Adults (ALDA). As much as the deaf community would like to believe that, broadcasters see it differently, and if the text you attach to the video is a byproduct, it’s a very important one.

There is another placement issue as well, where the captions are covering unrelated, yet still important, information. Television producers do not make this situation easy for captioners. Turn on an NFL broadcast, and you’ll see text and graphics covering nearly a third of the screen. What do you cover with the captions? The score? The other graphics? Or the game itself?

Even when the on-screen graphics consist of a single line of Chyron text, the captions often cover that text instead of bumping up a line or moving to the top of the screen. That text may contain the name and title of a person being interviewed, which isn’t mentioned in the captions. I sometimes find myself pausing the video, backing up, turning off captions, replaying a segment, and then turning captions back on, just so I can see names and titles that the captions were covering.

What can a captioner do about it? Placement is often mandated by the broadcaster, and your only option is to make sure they’re aware of the problem when they don’t leave you a place to put the captions. Most broadcasters have a television monitor somewhere in the studio showing the captions, but that doesn’t mean someone’s watching it.

“I’d guess at most of the stations who have engineers watching captions, they don’t pay too much attention until they have to,” Bill Graham noted.

Speaker Identification

Hearing people can usually tell who is speaking even when we can’t understand what they’re saying. Deaf viewers, however, rely on lip reading and other cues to identify speakers. If the speaker is off-screen or not facing the camera, they rely on the captions for speaker identification.

“I personally am hard of hearing, so I’m able to catch most of the emotional nuances when I’m listening to the TV”, said Mulvany. “I also can catch the facial expressions if I’m not listening to the sound and if the captions are synchronized.” Extreme delays definitely exacerbate the problem. It’s hard to remember whose lips were moving eight seconds ago in a fast-moving show.

There are a lot of reasons not to provide speaker identification when realtiming. It slows you down; sometimes you don’t know who’s speaking; you may not get the names in advance.

All understandable, but there is a middle ground. On a talk show, for example, having speaker IDs for the host and sidekick or bandleader might be enough if you add “Guest” and “Audience Member.” Even following the news convention of starting a line with >> when there is a new speaker would be a big help on many shows.

Mulvany went on to add, “Europeans use color to indicate who is speaking, so if that has been proven to work there, it would seem very useful here, too.” I’ve raised this question with captioners in the past, and met with a great deal of resistance, but I’m not entirely sure why.

Quite some time ago, I was doing some work with the BBC. They assigned colors to each of the anchors on the show, and used white text for everyone else. Once the speaker IDs were properly defined in the captioning software, the entire process was automatic. We’ve had that capability in U.S. captioning software for over 20 years, yet I know of nobody that uses it.

Non-verbal Cues

In the 1970s and 80s, when someone asked me the difference between closed captioning and subtitling, I had two easy differences to point out. The first was that captions could be turned on and off and subtitles couldn’t. The second was that captions included non-verbal cues for deaf/HoH people (e.g., “gunshot” or “footsteps approaching” or “Beethoven’s Fifth Symphony playing softly”).

This seems to have tapered off in recent years, and consumers who don’t understand it may actually complain about it, as we saw in January 2011. President Obama was speaking in Tucson at a memorial service, and someone happened to photograph the captioning on the Jumbotron just when the line [APPLAUSE] appeared. A blogger named Jim Hoft manufactured quite a bit of outrage by claiming that the captioner was asking audience members to applaud rather than indicating that they already were. He was shouted down rather swiftly, but the lesson remains: there are people who don’t understand why non-verbal cues are included in the captions.

Some broadcasters or captioners may be omitting non-verbal cues on purpose, but that’s not always how the deaf viewers see it.

“There just seem to be variations based on how diligent people are about doing their jobs,” said Willard. “I do see shows that give a lot of clues about background noise and others that don’t. Seems to come down to how much they care.”

Sometimes the deaf and HoH audiences ask for things that may not be practical to provide. “I think it’s probably not possible for realtime captioners to provide all the non-verbal information that’s desirable,” Mulvaney said, “but I do think it’s very important to indicate when the tone of voice is sarcastic or ironic.”

Is There an Answer?

The shift in captioning focus isn’t all bad. Bravin noted that, “Captioning has become more or less mainstream, so the deaf and HoH focus is pretty much gone, but it helps force the captioning issue because there s a legal requirement.”

Currently, television stations in the nation’s top 25 markets are required to provide realtime captioning for newscasts, but all other stations can use TelePrompTer-based captioning. Everyone I spoke to in the deaf/HoH communities agreed that upgrading the rest of the nation to realtime would be a great start.

“It’s been decades and I’m used to it, but the captioning of local news is a pain in the neck if you’re not in one of the big markets that requires real-time captioning,” said Tom Willard.

Training more new captioners is another issue. Obviously, the law of supply and demand would indicate that having too many captioners would drive down pay in a market that’s already seen dramatic declines in hourly rates in the last two decades. But consumers are concerned.

“The quality of the captioning is likely to get worse as the demand for captioning grows simply because there are not enough high-quality captioners out there,” Bill Graham commented. Graham isn’t just looking at television, though.

“All these webinars that are proliferating for example: few are captioned,” he continued. “If there is a webinar to help people get ahead in their jobs, what happens is that deaf people get farther behind. This is going to be a BIG problem in the future: news vs. livelihood; entertainment vs. education and jobs.”

And, finally, Willard echoed a common theme when he was speaking of disappearing (prescripted) captions and said, “I really resent that it is my job to be a compliance officer, that it is up to me to have to complain about it to my cable company.”

Bravin agreed: “It’s too much of an hassle to file a complaint, and then with no complaints it’s harder for the FCC to enforce quality.”

Should the FCC be legislating caption quality? Should broadcasters be working with deaf/HoH consumers to improve captioning? Questions like this can’t be resolved by captioners or captioning companies, but being aware of the issues that affect the lives of deaf and hard-of-hearing people can help keep you focused on the people who need you most.

On March 21, 2014, I gave a TEDx talk entitled, “Does Closed Captioning Still Serve Deaf People?” It covers some of the same quality issues addressed in this article.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s