Blog Archives

Steno by the Numbers

This article first appeared in the March 2005 issue of the Journal of Court Reporting. Despite being over ten years old, it might be something fun for any court reporters out there who are also math geeks. I know of at least one…

Do you have an analytical mind? Has it ever caused you to look down at your steno machine and wonder how many different strokes there are? How you could measure writing speed? Or how many different ways there were to misstroke “gubernatorial”? Then read on. In addition to having some fun with statistics, we’ll look at a few numbers that just might change the way you think about court reporting.

The Steno Keyboard

If you had a steno keyboard with two keys—call them A and B—you could write four different strokes: A, B, AB, and nothing at all. Since not pressing any keys isn’t really a stroke, we’ll subtract one and call it three possibilities. Add a third key—called C, perhaps—and the number of possible strokes jumps to eight (A, B, C, AB, AC, BC, ABC, and no keys), minus one for that “no keys” possibility. Each key you add doubles the number of strokes. A mathematician would say that a keyboard with K keys could generate 2K-1 different strokes.

A steno keyboard has seven keys on the left and ten on the right, plus an asterisk and four vowels, for a total of 22. Following our pattern from above, your steno machine has 222-1 possible strokes, a total of 4,194,303. If you use the number bar, your possibilities almost double again. I say “almost” because pressing the number bar by itself doesn’t count as a stroke on most machines, so you end up with 223-2 possible different strokes, or 8,388,606.

Obviously, there are certain strokes you can’t physically hit. Do you use both banks (STKPWHR-FRPBLGTS) as a speaker ID, perhaps for “The Court”? Try to hit that stroke with a number bar in it. There may be someone out there with a stroke in their dictionary ending in -TZ (without a -D or -S). If so, they’ll probably email me, but until then, I’ll call that one an impossible stroke, too. I tried to work out all of these impossible (or at least highly improbable) strokes, and I came up with fewer than 10,000 of them. Even if we stretch the definition to include strokes that are possible, but very difficult to hit cleanly (-FBLS, for example), eight million possible different strokes looks like a pretty good estimate.

Misstrokes and Your Dictionary

If there’s a word you frequently misstroke, what’s the easiest way to deal with it? Put the misstroke in your dictionary! In the early days of CAT, you had to watch your dictionary size carefully. Many systems had upper limits on dictionary size, and a large dictionary could dramatically slow down your translation. With today’s computers, massive dictionaries can easily be held in memory, and dictionaries with over 100,000 entries have become routine.

When I taught CAT back in the 1980s, I encouraged my students to keep their dictionaries lean and mean for fast translation and easy editing. Today, a few extra entries won’t hurt anything, and I even recommend adding misstrokes preemptively. When you’re adding a word or phrase to your dictionary, think how you might misstroke it, and put those misstrokes in during your prep. That way, your realtime translation just might come out cleaner. It’s not feasible, however, to add all of the possible ways you might misstroke something.

jcr-numbers1How many ways are there to misstroke something? Let’s take a simple stroke like KATS. You need three fingers for that stroke: left ring finger for the K-, left thumb for the A, and right little finger for the -TS. In figure 1, the green circle (position 5) shows where your right little finger should press for this stroke, and the eight red circles show what happens as that finger goes off-center. If you press at circle 4, for example, you’ll get -LGTS instead of -TS. The nine possible finger positions generate nine possible steno combinations, eight of which are incorrect. It is also possible that your finger doesn’t go down far enough to register at all. That brings us up to ten combinations, nine of which are errors.

Your finger could, of course, be even farther off than the eight “error positions” shown in figure 1, but that’s unlikely and uncommon enough that we don’t need to consider it. What if your finger is halfway between the correct position and one of the positions I’ve shown as an error? Well, either the key will register on the computer, or it won’t. You can’t get half of an error, and since we’re just counting possible errors here, we don’t have to consider that possibility.

jcr-numbers2But what of the K- in KATS? Since the two S- keys are effectively one key, does this reduce the possibilities? No, because each position still generates a different stroke. Position 1 produces ST-, position 4 produces STK-, and position 7 produces SK-. There are still ten possible positions (the nine circles plus “no stroke”).

With your thumb, there are only two keys, and nothing else close enough for a likely error, so instead of ten possible combinations, there are only four (A, O, AO, and no vowel at all).

The grand total, then, for the stroke KATS, is 10 x 10 x 4 = 400 possible ways you could write it. One of those is correct, and one (none of the fingers registering at all) can’t be entered in a dictionary, so there are 398 possible misstrokes. And this doesn’t even factor in the possibility of shadowing with another finger!

Should you enter all 398 of these into your dictionary as misstrokes for cats? Definitely not! Many of those are unlikely (pats, for example), and many others are valid words, such as cat, scat, scats, cot, cots, and cogs. Even with dictionaries of unlimited size, entering all of the possible misstrokes for a word just doesn’t make sense. Instead, enter only the ones that actually happen to you, and don’t conflict with other words.

[Author Note 2015: Most of today’s CAT software includes algorithms for identifying and correcting errors like this automatically. These algorithms are often incorrectly called “artificial intelligence,” but we’ll save that argument for another article!]

Communicating With the Computer

I’ve heard people pondering why it took steno keyboard manufacturers so long to move from the old, slow, serial interface to the new, high-speed, interfaces like USB. The answer? They didn’t need the speed. The only reason for adding USB to a steno keyboard is that so many new computers don’t have serial ports any more.

When computers communicate, they break data into chunks. The basic chunk is called a “bit,” and it’s equal to a binary digit. A bit can represent either one or zero. The next bigger chunk is the “byte,” which has eight bits. It can hold a number from 0 to 255. A steno stroke requires 23 bits—one for each key on the steno keyboard—which means it will fit in three bytes. Many interfaces have extra information such as stenomarks, so four bytes is more typical.

Communication speed for computers is measured in bits per second. For some rather complex reasons involving “start bits” and other obscure telecommunications protocols, it usually takes ten bits rather than eight to transmit a byte over a serial port. If a stroke takes four bytes, then it will take 40 bits to send it over a typical serial line.

If you can sustain a speed of five strokes per second, then you’re really moving. If each stroke takes 40 bits to send, then a steno machine would have to communicate with the computer at 5×40=200 bits per second to handle that blazing five stroke per second speed.

Does your computer have a modem? If so, it’s probably a 56K modem, which can transmit approximately 56,000 bits of information each second over a standard serial line to your computer (I’m oversimplifying here, but that’s close enough for our purposes). That’s over 250 times faster than your steno machine needs to communicate, and your serial port is capable of operating much faster than that. Even today, captioners routinely use old slow modems, because they simply don’t need the speed.

USB is a much better way to download huge picture files from your camera, but it is overkill for your steno machine by at least four orders of magnitude.

How Fast is Fast?

Wouldn’t it be nice to have a speedometer on your steno machine to show you how fast you were writing? You wouldn’t want it to face the attorneys, of course, because they’d take it as a matter of pride if they could “red line” your steno machine! The problem here is figuring out exactly how to measure your writing speed.


On a standard typewriter or computer keyboard, which is called a QWERTY keyboard because of the keys at the top left, people use a very simple measurement for words per minute (wpm). If you assume an average English word is five letters plus a space or punctuation mark, then you can just count how many keys you press in a minute, divide by six, and you have your typing speed in wpm.

It’s a bit more difficult in steno. What do you really want to measure? If it’s the speed of your hands, count strokes. If it’s your final output, count letters or actual words. Schools and speed tests tend to count syllables, figuring that it’s a more consistent (and less theory-dependent) measure of what you’re doing.

Using the chart in Mark Kislingbury’s article, Rev Up Your Writing, in the July/August 2004 edition of the JCR, writing at 200 wpm could mean anywhere from three strokes per second (if you use an incredibly brief-heavy theory like Mark’s) on up to five strokes per second (if you write everything out and come back for all of your inflected endings). For comparison, writing 200 wpm on a QWERTY keyboard would require typing 1,200 keystrokes per minute, or 20 keystrokes per second, a virtually impossible feat.

With the advanced displays available on the latest crop of steno machines, a speedometer showing strokes per second (or strokes per minute) would be easy to add. A words-per-minute speedometer is slightly more complex. It would require a steno machine that translates that could simultaneously count actual words generated per minute. Either type of speedometer could easily be added to a CAT program, as some have already done.

Your Poor, Abused Steno Keyboard

When my brother and I were working on the design of a steno keyboard, we went to see a parts manufacturer about some custom key stem designs. One of the first questions he asked was, “How many times will this key get pressed?” Wow! What a question!

We approached it by looking at worst-case numbers. A court reporter with a busy schedule that reports five days a week writes a whole lot of strokes. A quick survey of reporters showed that the number of strokes written in a full day varies all over the map, but we settled on 75,000 strokes as being a pretty big number. Assuming a couple of weeks of vacation time, that reporter will work 50 weeks in a year, at 5 days per week, for a total of 250 days in a year. Multiply that by our 75,000 stroke day, and we’ve got 18,750,000 strokes per year. Do your wrists hurt yet?

How long does a steno machine last? While we know reporters using machines that are 20+ years old, CAT reporters have a tendency to upgrade more often than that. Even so, you wouldn’t want to buy a machine that wouldn’t last 20 years, would you? Doing the math, that means your steno machine could easily take over a third of a billion (375,000,000) strokes over the course of a 20-year career!

A Career on a Disk

In the early days of CAT, dictionary size limitations often came from what could fit on a floppy diskette. PC-based CAT systems started out with hard drives holding 20 megabytes or so. Backing up was important not only to be safe, but because you just couldn’t fit that much information on a hard drive.

Today, instead of backing up dictionaries on floppies, we can just write them to a blank CD, along with everything else we need to save. Hard drives typically hold over 1,000 times what those early drives held. I was computer game shopping with my son-in-law the other day, and he showed me the newest game in his favorite series. On the box, it said that it requires over 11 gigabytes of free space on your hard drive just to install it!

That, of course, got me thinking. With today’s storage capacities, what would it take to back up a reporter’s entire career? We calculated earlier that a stroke of steno takes four bytes. That means our 375 million strokes would require 1.5 billion bytes. As a side note, a billion bytes is not the same as a gigabyte. The metric prefixes mean something a little different in the computer world, as they go up by factors of 1,024 (two to the tenth power) rather than factors of 1,000 (ten to the third power). This gives us:

1 kilobyte (1Kb) = 1,024 bytes
1 megabyte (1Mb) = 1,024 Kb = 1,048,576 bytes
1 gigabyte (1Gb) = 1,024 Mb = 1,073,741,824 bytes
1 terabyte (1Tb) = 1,024 Gb = 1,099,511,627,776 bytes

[Author Note 2015: For more about large & small numbers, see my article “Billions and Billions: A math lesson for NBC.”]

What about the final transcripts? That depends largely on the format you store them in. In the simplest ASCII format, a 250-page job with typical formatting takes about 375,000 bytes to store (your mileage may vary). Going back to our 250 work days per year over a 20 year span, that’s 1,875,000,000 bytes of ASCII transcript. Add that to our 1.5 billion bytes of steno, and we get 3,375,000,000 bytes, or about 3.14 gigabytes.

The latest computers these days have the option of writable DVD drives, which have a capacity of about 4.7 gigabytes. This means that one single disk is capable of holding every single steno stroke and every final transcript from an entire 20-year reporting career, with extra space to spare for dictionary backups and digital photos of your favorite attorney clients (just joking on that last one).

What about audio linkage? Assuming 8-hour days, that 20-year career would involve a mind-numbing 40,000 hours of listening to people argue. A typical computer audio recording using the .WAV file format takes up almost 40Mb per hour, even for low-quality mono recording. Saving all of that audio would require over 1.5 terabytes, which is vastly more information than a DVD can hold. Of course, Web servers with over a terabyte of disk space aren’t uncommon any more, and compressed audio formats like MP3 can shrink that 40Mb per hour down dramatically while simultaneously increasing the quality of the recording.

In the last 20 years, we’ve gone from floppy diskettes holding 360 Kb to DVD+RWs holding 4.7Gb. That’s over 10,000 times the storage capacity. If the pattern holds, then in another 20 years, we shouldn’t have any problem holding the steno, finished transcripts, and audio of a reporters entire career on a single disk-or perhaps it will look more like a key fob, a wristwatch, or a credit card.

Perhaps all this recreational number crunching has started your mind racing. Perhaps your eyes have glazed over and the only crunching you’re thinking of is Ben & Jerry’s Heath Bar Crunch Ice Cream. Either way, you just may look at your steno keyboard a little differently tomorrow morning.

SIDEBAR: About the Numbers

In Stephen Hawking’s wonderful book, A Brief History of Time, he says that he was told every equation he used would halve his sales. He managed to cover everything from Heisenberg’s uncertainty principle to string theory using only one equation (e=mc2). If he could pull that off, I figured I could do no less with this article. Unfortunately, either steno is more complex than I thought, or Professor Hawking is just a better writer, because I really needed to slip in a few formulas. I trust they won’t slow you down.

To write this article, I had to make some assumptions. One of them is that you’re using a traditional steno keyboard, with one initial S key, one asterisk, and one number bar. If you start adding function keys, splitting the initial S, and breaking up the number bar, everything changes. Users of steno keyboards like the Gemini and the various Digitext machines have far more potential strokes, assuming their CAT software supports all of the capabilities of their writer.

The Start of Online Captioning (realtime text transmission)

The Court Reporter's Guide to CyberspaceClosed captioning has been a part of television broadcasting for several decades. For pre-recorded shows, the captions can be added in a studio, carefully typed, proofed, and formatted. In the U.S., this is known as “offline” captioning. For a live show, someone has to type that text as it is spoken, known as “online” or “realtime” captioning. It is traditionally been performed using a stenotype keyboard like court reporters use, and the person typing at breakneck speeds of over 200 wpm is called a stenocaptioner (this is what my wife, Kathy, does for a living).

Realtime captioning technology was first used on a live broadcast during the Academy Awards in 1982, performed by my friend Martin Block. A decade later, it still hadn’t found its way into cyberspace, except in limited private chats. The company my wife and I started (Cheetah Systems) had been playing with the concept of streaming realtime text, but hadn’t had a chance to use it online. The following is an excerpt from my first book (The Court Reporter’s Guide to Cyberspace), with a wee bit of editing to bring it up-to-date and change the writing to first person.

The big unveiling of realtime into cyberspace occurred in November of 1994. California State Senator Barbara Boxer set up a conference in Washington, D.C., for California business leaders. One of the guests was Vice President Al Gore, speaking on the subject of “Building the Information Superhighway.” When I saw the Vice President’s name on my conference invitation, it seemed like the ideal opportunity to use this technology. At the time, there were some issues with fast text streaming on the Internet, but CompuServe had chat forums that were working well for the purpose. I called Vice President Gore’s office and suggested live streaming of the speech.

Barbara Boxer is not a large woman. At 6'5" tall, I ended up looking like the Jolly Green Giant in this photo with her.

Barbara Boxer is not a large woman. At 6’5″ tall, I ended up looking like the Jolly Green Giant in this photo with her.

At first, the Vice President’s office resisted the idea of realtiming the speech onto CompuServe. They felt that if it wasn’t being broadcast on television or radio, it shouldn’t be broadcast on CompuServe, either. In actuality, politicians speaking to special interest groups rarely want their words shared with general audiences. Eventually, though, both Senator Boxer and Vice President Gore agreed to have their speeches realtimed.  

Realtime reporter Jack Boenau from Sarasota, Florida, agreed to handle the realtiming. He and I flew to Washington. Richard Sherman (my co-author of The Court Reporter’s Guide to Cyberspace) reserved a virtual conference room in CompuServe’s “CRForum” (the forum for court reporters and captioners).

On the morning of the speech, Jack Boenau and I were present at the Russell Building in Washington, D.C., and the world was at their computers and logged into CRForum’s Conference Room 2, renamed “V.P. Gore Conference” for this historic event. Everybody anxiously awaited the scheduled 12 noon, EST, commencement.

At the last minute, we found that we couldn’t get a modem to connect using an outside line from the Russell Building because all of the building’s phone systems were digital. I ended up — much to the dismay of the Vice President’s security detail — stringing my modem lines behind the stage to be used by Senator Boxer and V.P. Gore, and into a little phone booth in the kitchen. I had to take apart the phone booth and jack in to the phone. The Senate techs weren’t overly enthused about my ad-hoc phone phreaking, either.

Once the hookup was complete, Jack provided entertaining and informative narration to online participants, describing the scene in Washington, the security clearances, the snarling dogs trained to lunge for the jugular at the sound of an unfolding tripod. I had an interesting encounter with one of the bomb-sniffing dogs, but I’ll save that story for another time.

And the speech in Washington was read on computer screens across America as it happened. Here is the beginning of Senator Boxer’s introduction of the Vice President (taken directly from the transcript):  

SENATOR BOXER: One thing I wanted to mention to you, which is terrific, today’s speech by Vice President Al Gore is about building the information superhighway, but the Vice President isn’t just talking, however. The speech, part of the seminar put on by yours truly, is being transmitted live onto CompuServe, one of the information services that make up the prototypical information superhighway. So as we sit here right now, because of these terrific people, with about a two-minute lag, they will be receiving the speech. Oh, I’m sorry, a two-second interval. They will be receiving the speech. See, I have to catch up. You’re so far ahead.

The last comment was directed at Jack and I, as we gave a thumbs-up for the correction on delay time. How fitting it was that the first major national “broadcast” of this type was on the subject of the information superhighway! In the words of the Vice President himself during this address:

The changes that are now underway within our society and within our civilization as a result of new information technologies is very difficult to overstate. These changes are of the same order of magnitude as those changes which accompanied the invention of the printing press, except that these changes will not be strung out over centuries. Instead, the impact will be telescoped into only a few years.

Jack Boenau (on the right) and I hoping that our power suits and 90s haircuts will keep that cutting-edge technology working. The tea and apple pie was to keep us working.

Jack Boenau (on the right) and I hoping that our power suits and 90s haircuts will keep that cutting-edge technology working. The tea and apple pie was to keep us working.

From around the country and the world, reporters and lay persons witnessed a remarkable event. Sitting thousands of miles away, everyone could participate in an event otherwise accessible only by those in attendance. Those online could watch the words of the Vice President scroll across computer monitors, and although no questions were entertained from the general public during this session, individuals sitting at computer keyboards had the capability to ask questions, offer input, or cast votes in an election situation, if permitted to do so.

Everything worked beautifully and an entirely new arena opened up for realtime reporters through the melding of two technologies: online communications technology and this latest breakthrough in reporting technology.

Today, this kind of event is taken for granted. In 1994, it was groundbreaking. In fact, it became the backbone of an Internet broadcasting company (Cheetah Broadcasting) that my brother and I ran for several years, performing live transcription of events onto CompuServe, America Online, Internet chat rooms, and eventually dedicated web applications.

%d bloggers like this: