
After understanding the meaning of garbage in garbage out, you may wonder what is an example case of garbage in garbage out. According to Charles Wheelan in his book entitled Naked Statistics. People often call it ‘how to lie with statistics.’ Many believe that poor statistical results stem from flawed statistical processes. But basically bad data input leads to bad analysis results. In other words, bad data causes bad statistical results.
Example GIGO
Survivor bias
During World War II, military researchers wanted to improve the design of existing aircraft, thinking that the current design caused many casualties. All aircraft that survived the battlefield were scrutinized, and which parts got the most shots were mapped out. The results obtained are as follows.
After repairs, the number of plane crashes did not decrease and Abraham Wald, took a different approach. He argued that if you want to reduce the number of plane crashes, you need to study the planes that crashed. As a result, on airplanes that return safely, the holes are not thickened. Because in this section the plane can still return home safely.
Survivor bias is a logical error that results in errors in drawing conclusions. This can happen because we are too focused on a successful thing, so naturally we tend to ignore other things. This can happen due to a lack of data that affects the decisions made.
Publication bias
Publication bias is an example of the next garbage in garbage out, Publication can be a tendency in a researcher who will publish the results of his research. They tend to publish their research results if they produce positive results.
Researchers do this because research that produces positive results tends to be more valued than research that does not produce results. In addition, researchers are also threatened with not being given sponsorship from the funding company if their research does not produce results. With all these things, researchers unconsciously make publication bias.
To determine whether a research topic experiences publication bias, researchers can use visualization. In the left image, the distribution appears fairly even, indicating low publication bias. In contrast, the right image shows an uneven distribution, suggesting high publication bias.
Selection bias
In my home country, Indonesia, the government holds a presidential election every 5 years, and each candidate usually conducts a quick count to boost their popularity. This results in selection bias happening every 5 years on a national scale, haha. This is very very funny, where many institutions do a quick count, but the results differ. For ordinary people, different quick count results can lead to a long debate between supporters of each presidential candidate.
Selection bias is an error in sampling that leads to an error in the overall conclusion. Selection bias does not only occur in quick counts but can occur in many fields.
One way to avoid selection bias is to carefully select the target sample in accordance with the research objectives. It is intended that all samples have the necessary requirements, and all eligible samples have the same opportunity to participate in the research.
Conclusion
Garbage in and garbage out is a phenomenon that occurs in the process of making data science analysis, we can reduce the potential for garbage by using CRISP-DM, of course by making some adjustments. Different cases of course require different approaches to solving them. By using the right method, I hope we can be better at analyzing and producing the right analysis products.
If you enjoyed this post on Sentiment analysis and interpreting data through data viewpoints, feel free to get in touch with me (Febrian Nur Alam) regarding any thoughts or queries!
Please read the series data can be deceiving: Garbage in garbage out (GIGO), Example of Garbage In Garbage Out, and How to lie with statistics

