
Prologue
When I first heard the term how to lie with statistics. I was immediately interested in this topic, I asked how can people lie using statistics? And are all statistics lies? What are the characteristics of statistical products that are not lies? And how to distinguish neutral statistical products? On this occasion I will try to explain how to lie with statistics according to the opinion of Darrell Huff
Exploration
How to lie with Statistics is a book by Darrell Huff. He is a writer and editor of Liberty Magazine. There, he wrote several other articles on the same theme. Although the book written by Huff has pros and cons, because some of the examples given are considered irrelevant to the times. But his idea of how to distinguish between good and bad statistics is worth studying.
There are three types of lies — lies, damn lies, and statistics
Benjamin Disraeli
Statistics should be meant to provide objective information to the public. Sometimes they are misused for the benefit of some groups to fulfill their ambitions for profit. An example of how to lie with statistics is a company claim that 90% of people are satisfied with the products they offer. You may believe this claim, but if the sample used is 10 and 9 of the sample are employees of the company. I am sure you will change your mind.
Another example is to imagine that you want to apply to a start-up company. You get information from the newspaper that the average salary in the company is 20 million rupiah per month. Then you apply there, and you get a salary of 5 million per month. Of course you feel tricked and then do further investigation and it is found that the director who works is paid 200 million per month while 9 other employees are paid 5 million per month. This results in the mean being 20 million while the median is 5 million. Sometimes some parties deliberately cover up some existing data to benefit themselves.
Epilogue
In his book, Huff also provides recommendations on how we can avoid statistics that are deliberately made to benefit certain parties. The first is who is the publisher of the statistical product is, whether it is a neutral party. Second, the sampling method, whether the sample taken is a neutral and representative sample. Third, do not easily believe in the mean, because looking at data with the mean, median, and mode will give a different impression of the data, such as the example of the average salary of a company that has been exemplified earlier.
I suggest always being skeptical of any information we get, and try to get information from two or more parties. We must learn that statistical results are not that important, but how the data is obtained, processed, and presented is much more important. That way, we can understand information better and in context, and not be easily consumed by misinformation. And most importantly, statistics are never wrong, it is humans who deliberately make them wrong. Always remember Benjamin Disraeli’s words “There are three types of lies – lies, damn lies, and statistics”.
If you enjoyed this post on Sentiment analysis and interpreting data through data viewpoints, feel free to get in touch with me (Febrian Nur Alam) regarding any thoughts or queries!
Please read the series data can be deceiving: Garbage in garbage out (GIGO), Example of Garbage In Garbage Out, and How to lie with statistics