I wrote this article in 1999, before the switchover from analog to digital TV. It first appeared in print in the January 2000 issue of Nuts & Volts Magazine. At the time, very little technical information about closed captioning had been published for do-it-yourselfers.
There’s more in your TV signal than just video and audio. Closed captions, V-chip information, time of day, program and network information, Internet links, and more lurk within the broadcast signal just waiting for you to pull it all out.
Remember those old TV sets with the “vertical hold” knob? If you turned the knob a bit, you’d see parts of two frames, with a black bar between them. That black bar is the VBI, or vertical blanking interval. It actually consists of the first 21 scan lines of the picture, although what’s contained there is non-picture data and synch signals.
In 1976, the FCC reserved line 21 of the VBI for closed captioning. Since mid-1993, all television sets 13″ and larger have been required to have caption decoders built in. If your TV is newer than that, you can press the “CC” button on your remote or choose “captioning” from your on-screen menu, and watch the program audio rendered as text on the screen.
With the recent proliferation of inexpensive TV tuner cards for computers, you now have an easy way to get this data into your computer and manipulate it.
What’s actually in there
Each picture line in a television signal has two fields. Each field of line 21 contains a single stream of data, containing different types of data packets. The bandwidth was kept very low to make the data as robust as possible, so each field of each frame can contain only two 7-bit characters. Since video is transmitted at 30 frames per second, that gives us 60 characters per second in each field.
Field 1 of line 21 contains two captioning channels (CC1 and CC2) and two “text” channels (TEXT1 and TEXT2). All four of these data channels share that 600 cps data stream, and the information is sorted out using packet headers. Field 2 contains a matching set of data channels (CC3, CC4, TEXT3, and TEXT4), and can also contain extended data services (XDS) packets.
If you consider a “word” to be five characters plus a space, then we have 600 words per minute of bandwidth in field 1. Theoretically, this should be plenty for two caption channels, but when overhead, positioning information, attribute data, and the two text channels are factored in, it may not be enough. On top of all that, dialog comes in bursts, and those bursts are likely to be synchronized to happen on both caption channels at once. For this reason, programs with two caption channels will typically put the second caption stream into CC3, which gives each caption stream its own 600 cps of bandwidth. An example is CBS’ 60 Minutes, which puts English captions in CC1 and Spanish captions in CC3.
The character set used for this data is a slightly modified 7-bit ASCII. All of the standard alphanumeric characters are where you would expect them, but some accented letters have been relocated into the hex 20-7F range. The full character set can be found in my Closed Caption FAQ.
Closed caption and text data
The vast majority of programming in the U.S. uses only CC1 for caption data. Until recently, few programs actually had captions, but the Telecommunications Act of 1996 makes captioning mandatory. As of January 2000, the first milestone in the Telecom Act requires a minimum of 5 hours per day on each channel, and some channels caption much more than that, so there’s plenty of caption data out there.
Although the caption data specification (EIA-608) allows for italicizing, underlining, flashing (blinking), and various foreground colors, the only attribute used with any regularity is italics. Captioners typically use italics to designate an off-screen speaker, a narrator, or a sound effect.
If you’re going to save captions to a text file, then you’ll probably want to do your own wordwrapping. Most closed captioning starts a new line at the end of every sentence, and the 32-character line width is shorter than you’d want for most applications. The typical flag for “change of speaker” is a pair of greater-than symbols at the beginning of a line, which may or may not be followed by a speaker identification.
In most cases, the only service in field 1 with data in it will be CC1. If you aren’t using a card that sorts out the data services for you, there’s an easy way to deal with the raw byte stream in that CC1-only situation. Just take any block of consecutive bytes less than 20h, and replace them with a single space. You can use a simple lookup table to do the substitutions where the character set deviates from US-ASCII.
Caution: If you try this when there’s data in CC2, TEXT1, or TEXT2, it will be interspersed in your CC1 data, and will make everything totally unreadable. Your best bet is to look for a TV tuner card that separates the data services for you.
Closed caption data may be positioned anywhere on the screen, and there can’t be more than four lines (rows) visible at a time. Captions are typically positioned so that nothing critical in the picture will be covered. Text data, on the other hand, is designed to fill the screen (although some televisions limit it to half), completely covering the picture.
Interactive TV and Internet data
Traditionally, the text channels have been used for things like on-screen program guides, but they are rarely used today. The most common use of text is for ITV (Interactive Television) Links.
ITV Links were developed by WebTV and VITAC as a way to transmit Internet URLs (Uniform Resource Locators) in the video signal for set-top boxes. These URLs point to Web pages that contain more information about the program or commercial currently airing. For example, during a program about electronics magazines, the station may insert an ITV link pointing to Nuts & Volts magazine. When that link is broadcast, people with WebTV Plus set-top boxes would see an Internet icon in the corner of their TV screen. They could then press a key on their WebTV controller, and be taken directly to the Nuts & Volts home page, or whatever page was indicated.
The ITV link itself is broadcast in TEXT2, using US-ASCII (ISO-8859-1) characters rather than the modified closed caption character set described above. It consists of a URL enclosed in angle brackets, an optional series of attributes in square brackets, and a checksum.
The only attributes you’re likely to care about if you’re parsing ITV links are the URL and the name of the link. To find the name, scan for the text “[name:” (or just “[n:”), and parse to the next closing square bracket.
The URL field is not limited to only Web addresses, so if you’re using these links, be prepared to deal not only with http links, but with mailto, news, and other link types as well. For example, an ITV link might look like this:
<mailto:firstname.lastname@example.org>[t:s][name:Email the Author] [expires:20000521T115959][CE8A]
The [t:s] is a “type” field, the expiration date tells you how long the link will be good (May 25, 2000 at 11:59:59 in this example), and the final [CE8A] is a checksum (see Internet RFC 1071 for details).
The extended data services provide information about the current program, TV station, and network. Unlike the caption and text data, they are packets rather than continuous streams of data.
The XDS packet most likely to change the world is the time-of-day packet. VCRs and TVs can use it to set their own clocks, eliminating the “blinking 12:00” phenomenon so common in non-techie households. Other XDS packets include:
- Name, length, and start time of current show
- Type of show, based on a set of category codes
- Program content advisory (see “V-chip data” below)
- Network name
- Station name and number
- National weather service warning codes
To read XDS information, scan the data stream from line 21 field 2. The start code for an XDS packet is a byte less than 0Fh followed by a packet type byte. The end code is a 0Fh byte. As an example, if you wished to set your computer’s clock from the TV signal, you’d scan for a packet starting with 07h 01h, as in Figure 1.
There are a few oddities about this packet that need to be explored. First, the seconds. Rather than transmitting a whole byte for the seconds, the Z bit is set to 1 whenever the seconds are zero. This means that setting the time could take as long as a minute, while you wait for the seconds to tick over. You could also just watch for the minute value to change, and use that as your “seconds = 0” indicator. The Z bit allows this process to be stateless.
All times are UTC (also known as GMT). You need to know the time zone you’re in to set your local clock. If the D bit is on, it is daylight savings time.
To set the date, add 1990 to the value of the year bits. Yes, this means the system will break down in 2054, but the broadcast industry expects everyone to be switched over to DTV by then, where this mechanism is different. If the date shows as March 1, but your time zone indicates that your clock should be set a day earlier, you can use the L flag to determine if it is a leap year. If the L flag is on (one), then the date is February 29. Otherwise, it is February 28.
If you wish to decode and interpret these packets further, you’ll want a copy of the EIA-608 specification (see the sidebar, “Where to get more information” below).
AUTHOR NOTE: This information is also available in The Closed Captioning Handbook, available April 2004 from Focal Press
XDS is also how the infamous V-chip gets its data (see the sidebar, “What’s a V-chip?” below). The V-chip spec supports four different rating systems, although any one program can only be rated using one system.
- MPAA is the rating system you’re used to from the movies (G, PG, PG13, R, NC17, X).
- US TV Parental Guidelines is the new system developed specifically for V-chip (TV-Y, TV-Y7, TV-G, TV-PG, TV-14, TV-MA).
- Canadian English is used throughout all of English-speaking Canada.
- Canadian French is used in Quebec.
Let’s look at the anatomy of a typical V-chip packet. It will always begin with the two-byte pair 01h, 05h. The meaning of the next two bytes varies depending on the rating system. Since the US TV Parental Guidelines is the system you’ll see the most, we’ll use that one. The next two bytes would look like Figure 2:
The D, V, S, and L bits are flags that further refine the rating. The D flag indicates sexually suggestive dialog, V is violence, S is sexual situations, and L is strong language.
Like all other XDS packets, the parental guidelines packets must end in 0F hex. To put this all together, a program rated TV-PG-V would have a V-chip packet of 01h 05h 48h 64h 0Fh.
Once you have figured out how to read and interpret this data, what do you do with it? You could:
Make transcripts of your favorite shows. Note that this information is a copyrighted part of the video, and you can’t sell it or post it on your Web site.
Make a smart “TV Agent” that runs in the background and tells you when there’s something interesting to watch (see the sidebar, “Your own TV-watching agent” below).
Track Internet links. When an ITV link appears, automatically feed it to your Web browser so you can see what’s related to your current show.
Set the time on your computer.
Watch for weather alerts in XDS. You could tie this to audible alerts, or even have your computer use the modem to call your pager or cell phone. Don’t rely on getting your alerts here, though, because few stations actually broadcast them.
Collect song lyrics. Not that many music videos are captioned today, but the number is increasing steadily as the Telecommunications Act mandate begins to take effect. Again, be careful of copyright considerations here.
I even found someone who had built a “commercial killer” by detecting patterns in line 21 data that usually indicate the start and end of commercials. His would mute the volume on the TV when it detected a commercial, but yours could do whatever you wish.
Good luck, and have fun mining line 21. If you come up with an interesting application for caption data on your computer, email me and let me know!
SIDEBAR: Where to get more information
If you want to get serious about decoding and interpreting captioning and other line 21 data, you’ll want to pick up the appropriate standards documents. You can get them from
Global Engineering Documents 15 Inverness Way East Englewood, CO 80112-5776 USA Phone: 800/854-7179 (U.S. and Canada) 303/397-7956 (International) Email: email@example.com Web: global.ihs.com
Be prepared to pay, though. The base document, EIA-608, is over $100, and there are auxiliary documents you would also need.
The author of this article has written a book called Inside Captioning. It does not have detailed instructions for decoding line 21 data, but it does have extensive information about the industry, the technology, and the history of captions.
AUTHOR NOTE: Inside Captioning is out of print. The new Closed Captioning Handbook is a far more comprehensive guide, filled with technical data right down to the bits and bytes.
SIDEBAR: What’s a V-chip?
Don’t try tearing apart your television set looking for the V-chip. You won’t find it. Even though all televisions must contain a “V-chip” now by law, there really is no such thing.
The data for the V-chip, as the article explains, is simply XDS packets containing parental content advisories. Since the TV must contain circuitry to interpret captioning information in the VBI, the V-chip capabilities were added to the captioning chip.
The V-chip, short for “violence chip,” allows parents to control what shows their children can watch. To use this capability, you set filters on your TV. Depending on the rating system being used, you can get fairly detailed. For example, you might choose to allow anything rated TV-14, unless it contains excessive violence (TV-14-V). The set will then monitor the incoming signal, and if it detects anything rated TV-14-V or higher, the audio, video, and captions will be blocked.
The visible content advisory icon that appears in the corner of your screen at the beginning of a program is not generated by your television, and isn’t dependent on the V-chip data. The V-chip data is also retransmitted constantly so that if you change channels, it will detect the new rating quickly.
SIDEBAR: Your own TV-watching agent
Once your computer can read line 21 data from the VBI, an obvious application would be a program to “watch” a specified channel and notify you when something of interest comes up.
Such a program would require a triggering mechanism, such as recognition of a word or phrase from a keyword list. Make sure your keyword checking is not case-sensitive, as most, but not all, captioning is done in uppercase. You could also trigger on ITV links or XDS data.
Once you’ve defined your triggers, you need to define the action. Do you want the program to notify you, using audible alerts? Do you want it to activate a full-screen TV picture on your computer, with the volume turned up? Do you want it to save you a transcript for later? Turn on a VCR? Send you an email? Page you?
If you’re going to have the program save caption data, make sure you back up a bit from the place where the keyword was recognized, so that you get the whole story in context.
You’ll also need a trigger to turn it off. The easiest way to do this is with a timer. If you do that, you should reset the timer every time a keyword is triggered, so that you’ll get all of a long story. You might also want to be more liberal with your keywords in this second trigger.
For example, if you’re scanning CNN for mention of Apple Computer, you wouldn’t want to use the keyword “apple” as a trigger, or you’d get far too many false hits. Once you’ve triggered on Apple Computer, though, you would probably want any mention of the word “apple” to keep the recorder running.