This article first appeared in the March 2005 issue of the Journal of Court Reporting. Despite being over ten years old, it might be something fun for any court reporters out there who are also math geeks. I know of at least one…
Do you have an analytical mind? Has it ever caused you to look down at your steno machine and wonder how many different strokes there are? How you could measure writing speed? Or how many different ways there were to misstroke “gubernatorial”? Then read on. In addition to having some fun with statistics, we’ll look at a few numbers that just might change the way you think about court reporting.
The Steno Keyboard
If you had a steno keyboard with two keys—call them A and B—you could write four different strokes: A, B, AB, and nothing at all. Since not pressing any keys isn’t really a stroke, we’ll subtract one and call it three possibilities. Add a third key—called C, perhaps—and the number of possible strokes jumps to eight (A, B, C, AB, AC, BC, ABC, and no keys), minus one for that “no keys” possibility. Each key you add doubles the number of strokes. A mathematician would say that a keyboard with K keys could generate 2K-1 different strokes.
A steno keyboard has seven keys on the left and ten on the right, plus an asterisk and four vowels, for a total of 22. Following our pattern from above, your steno machine has 222-1 possible strokes, a total of 4,194,303. If you use the number bar, your possibilities almost double again. I say “almost” because pressing the number bar by itself doesn’t count as a stroke on most machines, so you end up with 223-2 possible different strokes, or 8,388,606.
Obviously, there are certain strokes you can’t physically hit. Do you use both banks (STKPWHR-FRPBLGTS) as a speaker ID, perhaps for “The Court”? Try to hit that stroke with a number bar in it. There may be someone out there with a stroke in their dictionary ending in -TZ (without a -D or -S). If so, they’ll probably email me, but until then, I’ll call that one an impossible stroke, too. I tried to work out all of these impossible (or at least highly improbable) strokes, and I came up with fewer than 10,000 of them. Even if we stretch the definition to include strokes that are possible, but very difficult to hit cleanly (-FBLS, for example), eight million possible different strokes looks like a pretty good estimate.
Misstrokes and Your Dictionary
If there’s a word you frequently misstroke, what’s the easiest way to deal with it? Put the misstroke in your dictionary! In the early days of CAT, you had to watch your dictionary size carefully. Many systems had upper limits on dictionary size, and a large dictionary could dramatically slow down your translation. With today’s computers, massive dictionaries can easily be held in memory, and dictionaries with over 100,000 entries have become routine.
When I taught CAT back in the 1980s, I encouraged my students to keep their dictionaries lean and mean for fast translation and easy editing. Today, a few extra entries won’t hurt anything, and I even recommend adding misstrokes preemptively. When you’re adding a word or phrase to your dictionary, think how you might misstroke it, and put those misstrokes in during your prep. That way, your realtime translation just might come out cleaner. It’s not feasible, however, to add all of the possible ways you might misstroke something.
How many ways are there to misstroke something? Let’s take a simple stroke like KATS. You need three fingers for that stroke: left ring finger for the K-, left thumb for the A, and right little finger for the -TS. In figure 1, the green circle (position 5) shows where your right little finger should press for this stroke, and the eight red circles show what happens as that finger goes off-center. If you press at circle 4, for example, you’ll get -LGTS instead of -TS. The nine possible finger positions generate nine possible steno combinations, eight of which are incorrect. It is also possible that your finger doesn’t go down far enough to register at all. That brings us up to ten combinations, nine of which are errors.
Your finger could, of course, be even farther off than the eight “error positions” shown in figure 1, but that’s unlikely and uncommon enough that we don’t need to consider it. What if your finger is halfway between the correct position and one of the positions I’ve shown as an error? Well, either the key will register on the computer, or it won’t. You can’t get half of an error, and since we’re just counting possible errors here, we don’t have to consider that possibility.
But what of the K- in KATS? Since the two S- keys are effectively one key, does this reduce the possibilities? No, because each position still generates a different stroke. Position 1 produces ST-, position 4 produces STK-, and position 7 produces SK-. There are still ten possible positions (the nine circles plus “no stroke”).
With your thumb, there are only two keys, and nothing else close enough for a likely error, so instead of ten possible combinations, there are only four (A, O, AO, and no vowel at all).
The grand total, then, for the stroke KATS, is 10 x 10 x 4 = 400 possible ways you could write it. One of those is correct, and one (none of the fingers registering at all) can’t be entered in a dictionary, so there are 398 possible misstrokes. And this doesn’t even factor in the possibility of shadowing with another finger!
Should you enter all 398 of these into your dictionary as misstrokes for cats? Definitely not! Many of those are unlikely (pats, for example), and many others are valid words, such as cat, scat, scats, cot, cots, and cogs. Even with dictionaries of unlimited size, entering all of the possible misstrokes for a word just doesn’t make sense. Instead, enter only the ones that actually happen to you, and don’t conflict with other words.
[Author Note 2015: Most of today’s CAT software includes algorithms for identifying and correcting errors like this automatically. These algorithms are often incorrectly called “artificial intelligence,” but we’ll save that argument for another article!]
Communicating With the Computer
I’ve heard people pondering why it took steno keyboard manufacturers so long to move from the old, slow, serial interface to the new, high-speed, interfaces like USB. The answer? They didn’t need the speed. The only reason for adding USB to a steno keyboard is that so many new computers don’t have serial ports any more.
When computers communicate, they break data into chunks. The basic chunk is called a “bit,” and it’s equal to a binary digit. A bit can represent either one or zero. The next bigger chunk is the “byte,” which has eight bits. It can hold a number from 0 to 255. A steno stroke requires 23 bits—one for each key on the steno keyboard—which means it will fit in three bytes. Many interfaces have extra information such as stenomarks, so four bytes is more typical.
Communication speed for computers is measured in bits per second. For some rather complex reasons involving “start bits” and other obscure telecommunications protocols, it usually takes ten bits rather than eight to transmit a byte over a serial port. If a stroke takes four bytes, then it will take 40 bits to send it over a typical serial line.
If you can sustain a speed of five strokes per second, then you’re really moving. If each stroke takes 40 bits to send, then a steno machine would have to communicate with the computer at 5×40=200 bits per second to handle that blazing five stroke per second speed.
Does your computer have a modem? If so, it’s probably a 56K modem, which can transmit approximately 56,000 bits of information each second over a standard serial line to your computer (I’m oversimplifying here, but that’s close enough for our purposes). That’s over 250 times faster than your steno machine needs to communicate, and your serial port is capable of operating much faster than that. Even today, captioners routinely use old slow modems, because they simply don’t need the speed.
USB is a much better way to download huge picture files from your camera, but it is overkill for your steno machine by at least four orders of magnitude.
How Fast is Fast?
Wouldn’t it be nice to have a speedometer on your steno machine to show you how fast you were writing? You wouldn’t want it to face the attorneys, of course, because they’d take it as a matter of pride if they could “red line” your steno machine! The problem here is figuring out exactly how to measure your writing speed.
On a standard typewriter or computer keyboard, which is called a QWERTY keyboard because of the keys at the top left, people use a very simple measurement for words per minute (wpm). If you assume an average English word is five letters plus a space or punctuation mark, then you can just count how many keys you press in a minute, divide by six, and you have your typing speed in wpm.
It’s a bit more difficult in steno. What do you really want to measure? If it’s the speed of your hands, count strokes. If it’s your final output, count letters or actual words. Schools and speed tests tend to count syllables, figuring that it’s a more consistent (and less theory-dependent) measure of what you’re doing.
Using the chart in Mark Kislingbury’s article, Rev Up Your Writing, in the July/August 2004 edition of the JCR, writing at 200 wpm could mean anywhere from three strokes per second (if you use an incredibly brief-heavy theory like Mark’s) on up to five strokes per second (if you write everything out and come back for all of your inflected endings). For comparison, writing 200 wpm on a QWERTY keyboard would require typing 1,200 keystrokes per minute, or 20 keystrokes per second, a virtually impossible feat.
With the advanced displays available on the latest crop of steno machines, a speedometer showing strokes per second (or strokes per minute) would be easy to add. A words-per-minute speedometer is slightly more complex. It would require a steno machine that translates that could simultaneously count actual words generated per minute. Either type of speedometer could easily be added to a CAT program, as some have already done.
Your Poor, Abused Steno Keyboard
When my brother and I were working on the design of a steno keyboard, we went to see a parts manufacturer about some custom key stem designs. One of the first questions he asked was, “How many times will this key get pressed?” Wow! What a question!
We approached it by looking at worst-case numbers. A court reporter with a busy schedule that reports five days a week writes a whole lot of strokes. A quick survey of reporters showed that the number of strokes written in a full day varies all over the map, but we settled on 75,000 strokes as being a pretty big number. Assuming a couple of weeks of vacation time, that reporter will work 50 weeks in a year, at 5 days per week, for a total of 250 days in a year. Multiply that by our 75,000 stroke day, and we’ve got 18,750,000 strokes per year. Do your wrists hurt yet?
How long does a steno machine last? While we know reporters using machines that are 20+ years old, CAT reporters have a tendency to upgrade more often than that. Even so, you wouldn’t want to buy a machine that wouldn’t last 20 years, would you? Doing the math, that means your steno machine could easily take over a third of a billion (375,000,000) strokes over the course of a 20-year career!
A Career on a Disk
In the early days of CAT, dictionary size limitations often came from what could fit on a floppy diskette. PC-based CAT systems started out with hard drives holding 20 megabytes or so. Backing up was important not only to be safe, but because you just couldn’t fit that much information on a hard drive.
Today, instead of backing up dictionaries on floppies, we can just write them to a blank CD, along with everything else we need to save. Hard drives typically hold over 1,000 times what those early drives held. I was computer game shopping with my son-in-law the other day, and he showed me the newest game in his favorite series. On the box, it said that it requires over 11 gigabytes of free space on your hard drive just to install it!
That, of course, got me thinking. With today’s storage capacities, what would it take to back up a reporter’s entire career? We calculated earlier that a stroke of steno takes four bytes. That means our 375 million strokes would require 1.5 billion bytes. As a side note, a billion bytes is not the same as a gigabyte. The metric prefixes mean something a little different in the computer world, as they go up by factors of 1,024 (two to the tenth power) rather than factors of 1,000 (ten to the third power). This gives us:
1 kilobyte (1Kb) = 1,024 bytes
1 megabyte (1Mb) = 1,024 Kb = 1,048,576 bytes
1 gigabyte (1Gb) = 1,024 Mb = 1,073,741,824 bytes
1 terabyte (1Tb) = 1,024 Gb = 1,099,511,627,776 bytes
[Author Note 2015: For more about large & small numbers, see my article “Billions and Billions: A math lesson for NBC.”]
What about the final transcripts? That depends largely on the format you store them in. In the simplest ASCII format, a 250-page job with typical formatting takes about 375,000 bytes to store (your mileage may vary). Going back to our 250 work days per year over a 20 year span, that’s 1,875,000,000 bytes of ASCII transcript. Add that to our 1.5 billion bytes of steno, and we get 3,375,000,000 bytes, or about 3.14 gigabytes.
The latest computers these days have the option of writable DVD drives, which have a capacity of about 4.7 gigabytes. This means that one single disk is capable of holding every single steno stroke and every final transcript from an entire 20-year reporting career, with extra space to spare for dictionary backups and digital photos of your favorite attorney clients (just joking on that last one).
What about audio linkage? Assuming 8-hour days, that 20-year career would involve a mind-numbing 40,000 hours of listening to people argue. A typical computer audio recording using the .WAV file format takes up almost 40Mb per hour, even for low-quality mono recording. Saving all of that audio would require over 1.5 terabytes, which is vastly more information than a DVD can hold. Of course, Web servers with over a terabyte of disk space aren’t uncommon any more, and compressed audio formats like MP3 can shrink that 40Mb per hour down dramatically while simultaneously increasing the quality of the recording.
In the last 20 years, we’ve gone from floppy diskettes holding 360 Kb to DVD+RWs holding 4.7Gb. That’s over 10,000 times the storage capacity. If the pattern holds, then in another 20 years, we shouldn’t have any problem holding the steno, finished transcripts, and audio of a reporters entire career on a single disk-or perhaps it will look more like a key fob, a wristwatch, or a credit card.
Perhaps all this recreational number crunching has started your mind racing. Perhaps your eyes have glazed over and the only crunching you’re thinking of is Ben & Jerry’s Heath Bar Crunch Ice Cream. Either way, you just may look at your steno keyboard a little differently tomorrow morning.
SIDEBAR: About the Numbers
In Stephen Hawking’s wonderful book, A Brief History of Time, he says that he was told every equation he used would halve his sales. He managed to cover everything from Heisenberg’s uncertainty principle to string theory using only one equation (e=mc2). If he could pull that off, I figured I could do no less with this article. Unfortunately, either steno is more complex than I thought, or Professor Hawking is just a better writer, because I really needed to slip in a few formulas. I trust they won’t slow you down.
To write this article, I had to make some assumptions. One of them is that you’re using a traditional steno keyboard, with one initial S key, one asterisk, and one number bar. If you start adding function keys, splitting the initial S, and breaking up the number bar, everything changes. Users of steno keyboards like the Gemini and the various Digitext machines have far more potential strokes, assuming their CAT software supports all of the capabilities of their writer.