Inferential Statistics - Febrian Nur Alam

Have you ever read a study whose findings did not match your own experience? For example, there is a study that claims it takes three months for a fresh graduate to find a job, but you happened to find a job within five months. The above study is one of the products of inferential statistics, which aims to draw conclusions about a population. But why are there still errors in the prediction results? Are these studies unreliable?

Statistics can be classified into two groups based on how conclusions are drawn. The first is descriptive statistics and the second is inferential statistics. Descriptive statistics draw conclusions directly from the data, while inferential statistics allow you to make predictions from the sample data taken.

Inferential statistics is a method often used to draw conclusions from existing populations. On this occasion, I will discuss inferential statistics. From the definition, what things need to be considered when using inferential statistics, to the basic ideas of inferential statistics.

What is inferential statistics?

Inferential statistics has the root word “inference,” which means “description.” In general, it is a process of analysis and drawing conclusions from random sampling. Inferential statistics is a method of drawing conclusions based on predetermined parameters, and the samples taken must be random. It can help to see the relationship between the dependent variable and the independent variable.

Inferential statistics function

Inferential statistics serve to provide an overview of a population. The goal is not only to find the mean, median, and mode, but also to draw conclusions using statistical calculations, for example:

Is the drug that was tested effective or just a placebo?
Is the difference in opinion between men and women significantly different?
Will increasing the number of hours students study improve their grades?

How inferential statistics work

In general, statistics cannot prove anything, but we use inferential statistics to accept or reject hypotheses based on existing possibilities. Here are some things you need to understand in order to learn inferential statistics.

H₀ or null hypothesis

When conducting research, you will have initial assumptions about the final results of your research. Inferential statistics work by making an initial assumption or H₀. The null hypothesis, or H₀ states that there is no relationship between the variables you are studying, or no relationship between the groups of research objects. Inferential statistics are designed to reject or accept your assumptions. When you reject the initial assumption, you must accept the alternative assumption. Here is one example.

H₀ or null hypothesis: The drug does not lower blood pressure
H₁ or alternative hypothesis: The drug lowers blood pressure

You can see that H₀ is deliberately constructed so that there is no relationship between the variables you are studying. This is because it is easier in a statistical system to reject an argument than to accept it. If you reject, the choice is only yes or no, whereas if you accept an argument, the explanation will be very long.

p-value

P-value, or probability value, serves to inform you that your data is related to your initial hypothesis. This can be done by calculating the probability of your statistical test.

In inferential statistical analysis, the general rule states that if the p-value ≤ 0.05, then the null hypothesis (H₀) is rejected and the data are more likely to support (H₁). Conversely, if the p-value > 0.05, then there is insufficient evidence to reject the initial hypothesis, so H₀ is not rejected. I will create a table that will make it easier for you to distinguish between p-value conditions:

P-value condition	Condition H₀	Implications for H₁	Note
p-value ≤ 0,05	H₀ is rejected	H₁ is supported by the data	This indicates that there is sufficient statistical evidence to support claim H₁.
p-value > 0,05	H₀ fails to be rejected	H₁ is not supported by the data.	It does not mean that H₀ is true; it simply means that the evidence is insufficient to reject H₀.

To calculate the p-value, it is usually calculated automatically through the application you are using, for example, R, SPSS, or Python.

In general, the value used to determine whether a variable is significant or not is 0.05, but there are some cases where researchers use different values. This value, also known as the alpha value (α), is the significance level set by the researcher before the analysis begins.

For example, in a study on the relationship between the amount of fertilizer added and crop yield, the p-value was found to be 0.02, while the alpha value used was 0.03. This means that there is a significant difference between the fertilizer added and the crop yield produced.

Type I and Type II errors

In every statistical test, there are four possible outcomes

H₀ condition	Statistical Decision	Result
H₀ is correct.	H₀ is not rejected	Correct
H₀ is correct.	H₀ is rejected	Type I Error
H₀ is false	H₀ is rejected	Corret
H₀ is false	H₀ is not rejected	Type II Error

When conducting inferential statistical analysis, there will always be a risk of error that you will encounter.

Type I error: incorrectly rejecting the true null hypothesis, also known as a false positive
Type II error: failing to reject the false null hypothesis, also known as a false negative

Standard error (SE)

To perform inferential statistical analysis, you must understand the conditions of the data being analyzed. One thing to note is the standard error. The standard error is how much a data set varies if we take its average repeatedly. The smaller the standard error, the more confident we are in the estimates we produce.

How to minimize errors

After learning about the important aspects of inferential statistics, the next step is to learn about the factors that can reduce the potential for error in performing inferential statistics. Basically, the purpose of inferential statistics is to provide an overview of data, not to find definitive answers to all problems. The following are things to consider in order to present inferential statistics optimally.

Make sure that the sample is enough.

The most fundamental aspect of inferential statistics is ensuring that the sample size is adequate, because a small sample size is a source of error in analysis. If the sample is too small, the accuracy of the data produced is difficult to trust.

Set the significance level (α)

Make sure that the α you set is appropriate for the context of your research. This is because α that is too small can increase type I error, and α that is too large can increase type II error. In most cases, the parameter used for inferential statistics is 5%.

Do data cleansing

If the data you enter for analysis contains too much “dirty” data, no matter how well you perform the calculations, the results will be terrible. In data science, this is known as garbage in, garbage out. If the data entered is poor, the analysis results will be terrible.

That is all for today’s discussion. This analysis is not intended to provide definitive answers to questions, but rather to provide a general overview of a group. Therefore, it is only natural that you may read research results that differ from your own personal experiences.

Also read: What should you do when sales are falling?