Huff: How to Lie with Statistics

From Scienticity

(Difference between revisions)
Jump to: navigation, search

Revision as of 00:08, 20 November 2008

Scienticity: image: Bookbug.gif   image: Bookbug.gif   image: Bookbug.gif
Readability: image: Bookbug.gif   image: Bookbug.gif   image: Bookbug.gif
Hermeneutics: image: Bookbug.gif   image: Bookbug.gif   image: Bookbug.gif   image: Bookbug.gif
Charisma: image: Bookbug.gif   image: Bookbug.gif   image: Bookbug.gif   image: Bookbug.gif
Recommendation: image: Bookbug.gif   image: Bookbug.gif   image: Bookbug.gif
Ratings are described on the Book-note ratings page.

Darrell Huff, How to Lie with Statistics. New York : W.W.Norton & Company, 1954/1993. 142 pages, illustrated by Irving Geis.

Regardless of whether Benjamin Disraeli was the first to refer to "lies, damned lies, and statistics", people have been suspicious of statistics for a long time, and not without some reason. Statistics organize large amounts of data, characterizing those data with one or two numbers. Clearly detail is lost but, if the statistics are use honestly with the goal of providing understanding, they are valuable. However, when statistics are abused to promote a marketing scheme or political agenda, they are worse than worthless.

The problem lies not with the statistics but with the malfeasance of the abusers, of course, although this detail is often lost in the public understanding—or lack of understanding—of just what lies behind a statistic, and how it can be used to lie.

Correcting that misunderstanding is the goal of author Huff. As he says in the introduction to his book:

The secret language of statistics, so appealing in a fact-minded culture, is employed to sensationalize, inflate, confuse, and oversimplify. Statistical methods and statistical terms are necessary in reporting the mass data of social and economic trends, business conditions, "opinion" polls, the census. but without writers who use the words with honesty and understanding and readers who know what they mean, the result can only be semantic nonsense. [p. 8]

In other words, the purpose of his book is

... explaining how to look a phony statistic in the eye and face it down; and no less important, how to recognize sound, and usable data in that wilderness of fraud to which the previous chapters have been largely devoted. [p. 122]

His approach is to spend the first nine chapters recounting telling and amusing stories about the various recognizable forms of static-abuse, and how to recognize it when it happens. He touches on all the necessary topics: sampling bias, cherry-picking data for averages, not reporting statistical uncertainties, misleading graphing techniques, comparing statistics that shouldn't be compared, and, in a chapter called "Post Hoc Rides Again" (referring to the phrase "post hoc ergo propter hoc" = "after this, therefore because of this" or, as usually rendered today: "correlation doesn't mean causation"), the art of using correlation to imply a causal relationship. Here is one example from "Post Hoc Rides Again":

Professor Helen M. Walker has worked out an amusing illustration of the folly in assuming there must be cause and effect whenever two things vary together [i.e., correlate]. In investigating the relationship between age and some physical characteristics of women, begin by measuring the angle of the feet in walking. You will find that the angle tends to be greater among older women. You might first consider whether this indicates that women grow older because they toe out, and you can see immediately that this is ridiculous. So it appears that age increases the angle between the feet, and most women must come to toe out more as they grow older.

Any such conclusion is probably false and certainly unwarranted. You could only reach it legitimately by studying the same women—or possibly equivalent groups—over a period of time. That would eliminate the factor responsible here. Which is that the older women grew up at a time when a young lady was taught to toe out in walking, while the members of the younger group were learning posture in a day when that was discouraged.

When you find somebody—usually an interested party—making a fuss about a correlation, look first of all to see if it is not one of this type, produced by the stream of events, the trend of the times. In our time it is easy to show a positive correlation between any pair of things like these: number of students in college, number of inmates in mental institutions, consumption of cigarettes, incidence of heart disease, use of X-ray machines, production of false teeth, salaries of California school teachers, profits of Nevada gambling halls. To call some one of these he cause of some other is manifestly silly. But it is done every day. [p. 96—97]

In the more than 50 years since it was written this book has become rather famous and been widely read, and not without good reason. Huff's writing is engaging, he knows what he's talking about, and he clarifies quite a few important points. The many illustrations by Irving Geis are noticeably from an earlier time but they still do their job and they are still entertaining.

The book's relevance and utility have hardly diminished—statistics are still being routinely abused. Whether the abuse is greater or smaller would be too hard to say, but the tricks of the abusers are much as they were earlier in the last century.

A few things have changed since Huff was writing. He was rightly concerned at the time about the statistical slight of hand that marketers would use to confuse the three statistics, mean, mode, and median, by calling them all "averages". That has virtually disappeared and the proper terms are used almost universally, with "mean" being the sole attribute described as "average".

Also since Huff's time, the idea of sampling bias has penetrated into the public consciousness with a vengeance: these days uninformed critics of any public-opinion poll or correlation study (see "Post Hoc Rides Again") whose results displease them will immediately respond by alleging a clear case of sampling bias, generally without the least idea of how the sample was actually collected. That kind of naiveté is still embarrassing even though it's dressed up as statistical sophistication.

And so it seems that this little book by Darrell Huff still has its work cut out for it. Probably what has changed the most in the last 50 years is the manner of discourse in a book like this: it has no side-bars, no pull-quotes, none of the gewgaws that modern-day readers seem to expect (or publishers believe that readers expect). It may lend a vaguely quant air to the book, but there is no good replacement and the statistical lies that Huff describes are still with us and still worth developing defenses against.

-- Notes by JNS

Personal tools
science time-capsules