Lies, Damn Lies, and Sample Size

Hey there. Yeah, you. What magazines do you read? What programs do you watch on television? What radio stations do you listen to? What web sites do you visit? No, I haven’t started channeling John Ashcroft., I’m asking the kind of questions decent marketers ask – if they want to keep their jobs.

In marketing, it’s all about knowing your audience. Or, rather, knowing what a statistically accurate sub-sample of your given audience would do under a given set of parameters. Let’s face it, only the federal government is crazy enough to try and count every single human being in America and even they are trying to get out of the business. Even spaced at ten-year intervals, that can be a pain.

Marketers like to take the shorter route, market research by statistical sampling. That is, taking a sample that is representative of the population and extrapolating the results. It can be a complicated process. Having endured two semesters of statistics in college, I am of the mindset that the calculations alone take only slightly less time than actually counting everyone but then I’ve never been a fan of math. Actual statisticians swear by the practice. It saves money, it saves time, and it can actually be more accurate than hand counts – provided that your study doesn’t have other drawbacks.

The most well-known application of statistical sampling is the measurement of American television audience done by Nielsen Media Research. Few Americans haven’t heard of the company and even fewer haven’t heard of the results of their study, the Nielsen TV Ratings. People who have never been within miles of a statistics book can rattle off the Nielsen terminology – 31 share or 20.3 rating – at will. (By the way, these are the numbers Friends got a few weeks ago.)

The fact that the average person knows so much about the Nielsens is a indication of just how important they are to American business. Literally billions of dollars ride upon the outcome of the ongoing Nielsen study. Nielsens estimate (we’ll go into how well later) several things about the television audience. They estimate the size, age, gender, and income of television viewers watching given vehicles during specific periods of times. I say vehicles because the bottom line is, television programs are essentially just means of delivery for what’s more important to the economy, the ads. Television programs get cancelled not because they are bad (witness the crime against nature that is the Anna Nicole Show) but because they aren’t doing their job – delivering the ads to the target audience. The size of a vehicle’s Nielsen ratings determines the price that an ad aired during that vehicle can command. Size matters. That’s why you’ll never see an ad for your local used car dealership on NBC’s Thursday night lineup.

Or, at least, that’s what Nielsen Media Research says. But almost since their inception, people have been questioning whether the Nielsens are accurately estimating the American television audience. And since there is so much money at stake, Nielsen is constantly trying to convince the American business community and the American public that its numbers are good.

Traditionally, many of the complaints about the Nielsen study have been about the size of its sample – just over 5,000 households containing 13,000 individuals. While Nielsen’s web site offers up a convoluted analogy involving a pot of vegetable soup, a cup, and the behavior of diced carrots as evidence that they have this sampling thing down, doubters remain.

Having worked with both Nielsen data (which by the way, advertisers and television networks must pay a hefty annual subscription fee to access) and data from the magazine readership equivalent, MRI, I believe sample size is just the tip of the iceberg in terms of accuracy.

For one thing, Nielsen’s legendary people meters, boxes hard wired to television sets in Nielsen households to transmit viewer date back to the company, have failed to keep up with changes in cable and satellite technology. Nielsen’s people meters record the set’s channels and later match the channels to the local channel lineups. The emergence of competing cable and satellite providers in local markets mean that not only do households have access to up to 500 channels, local market lineups often overlap. One example: hundreds of households with cable in Northeast Pennsylvania receive signals from both New York City and Philadelphia stations, which means Nielsen is probably underreporting viewership for those local channels.

In addition, Nielsen assigns each member of one of its household a special code which they must enter into the people meter each time they enter a room to watch TV. Critics argue that the codes mean Nielsen is underreporting the viewing habits of older adults and children, who may not be able to remember or even read the codes. They say the coding system also underreports the viewing habits of visitors to a Nielsen household. In 1989, the Committee on Nationwide Television Audience Measurement (CONTAM), an industry group formed to monitor Nielsen, published a report that alleged many problems with Nielsen people meter data. They argued that telephone surveys of the Nielsen population showed that “people meters missed half of visitors to households, one in four men aged 18 to 34, and one in eight women aged 18 to 34 . . . [P]eople meters also missed a high proportion of children, especially teenagers.�

Many people also do not realize that not all of the 200 markets Nielsen measures use people meters. Nielsen families in smaller markets, such as Butte, Montana, still use paper diaries to record their viewership, which then must be mailed to Nielsen for tabulation. The accuracy of the data is these markets is therefore dependent on highly subjective items: respondent honesty and memory.

In fact, critics argue that a failure to account for changes in human behavior is the largest drawback in the Nielsen study. Television viewing habits have changed dramatically in the twenty years since people meters debuted. For many families, TV is background noise, on virtually all the time, regardless of who is actually watching With the invention of the remote control, the number of people who “scanâ€? through the ads has increased dramatically. VCRs pose another problem. Nielsen has yet to devise a way to accurately measure the number of people who record shows for later viewing. Television networks executives, who have been watching Nielsen figures for their shows plummet in the last decade, are very vocal about Nielsen’s drawbacks. Don Ohlmayer, former president of NBC’s West Coast division, said, “They’re trying to measure 21st-century technology with an abacus.”

In 1991, advertisers and networks had had enough. They helped fund a project that was envisioned as an alternative to Nielsen. The project, called Systems for Measuring And Reporting Television (SMART) used meters as well, but with a simpler code system, simpler installation process, and, supporters say, a better plan for capturing program data. The data would be taken from a unique code embedded in each show’s signal. SMART promised to make its system easier for families to use, thus resulting in better data. SMART’s creators also promised to simplify the complex subscription plan that Nielsen had subjected the networks and advertisers to for so many years, They would offer access to all of their data for a flat fee.

In the early 90s, the future seemed bright for SMART. They were testing their system with 500 Philadelphia-area families. All four networks and several large adversities and advertising agencies had signed on with funding. The problem was, developing SMART would take time, perhaps 10 years or more. In the meantime, everyone still had to pay Nielsen for its data. By the 1996, the networks and other backers were no longer willing to pour money into both Nielsen and the fledgling system and pulled out. SRI, the company that ran SMART, abandoned the systems as a potential ratings competitor (though it does make the data it collected available and has used the methodology as a springboard for research in other media uses, including the Internet). Network executives were disappointed but satisfied that the attempt had been a step in the right direction. As one executive wrote, “SMART failed as a business proposal and not as a ratings proposal.� The focus now has shifted back to urging Nielsen to improve its methodology, which the company began to do when the threat of SMART loomed.

As a participant in the circus that is advertising measurement and purchasing, I think the lesson is this: knowing your audience is not enough. Respecting them and their behaviors is as, if not more, important. I spent a year on a magazine study that forced respondents to sort a stack of over 300 cards emblazoned with magazine logos as an attempt to measure readership. It was a door-to-door study, conducted in the repondents’ homes. Well, that is, if the interviewers could talk their way inside. Each interview took almost two hours and respondents were compensated for completing this feat with a fake leather keychain. The length of the interview and the compensation had not changed in almost 10 years – for purposes of tracking, we were told. Was it any wonder that respondent rates were falling? This study, much like Nielsen and other large-scale media research studies, was attempting to measure audience behavior using outdated tools. No sample size is large enough to overcome a shortcoming like that.