## Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

- Knowledge Base
- Chi-Square Test of Independence | Formula, Guide & Examples

## Chi-Square Test of Independence | Formula, Guide & Examples

Published on May 30, 2022 by Shaun Turney . Revised on November 10, 2022.

## Table of contents

## Contingency tables

They reorganize the data into a contingency table:

They also visualize their data in a bar graph:

- Null hypothesis ( H 0 ): Variable 1 and variable 2 are not related in the population; The proportions of variable 1 are the same for different values of variable 2.
- Alternative hypothesis ( H a ): Variable 1 and variable 2 are related in the population; The proportions of variable 1 are not the same for different values of variable 2.

- Null hypothesis ( H 0 ): Whether a household recycles and the type of intervention they receive are not related in the population; The proportion of households that recycle is the same for all interventions.
- Alternative hypothesis ( H a ): Whether a household recycles and the type of intervention they receive are related in the population; The proportion of households that recycle is not the same for all interventions.

## Expected values

## Here's why students love Scribbr's proofreading services

Discover proofreading & editing

The following conditions are necessary if you want to perform a chi-square goodness of fit test:

- Chi-square tests of independence are usually performed on binary or nominal variables. They are sometimes performed on ordinal variables, although generally only on ordinal variables with fewer than five groups.
- The sample was random l y selected from the population .
- There are a minimum of five observations expected in each combined group.
- They want to test a hypothesis about the relationships between two categorical variables: whether a household recycles and the type of intervention.
- They recruited a random sample of 300 households.
- There are a minimum of five observations expected in each combined group. The smallest expected frequency is 12.57.

Pearson’s chi-square (Χ 2 ) is the test statistic for the chi-square test of independence:

- Χ 2 is the chi-square test statistic
- Σ is the summation operator (it means “take the sum of”)
- O is the observed frequency
- E is the expected frequency

Follow these five steps to calculate the test statistic:

## Step 1: Create a table

Create a table with the observed and expected frequencies in two columns.

## Step 2: Calculate O − E

In a new column called “ O − E ”, subtract the expected frequencies from the observed frequencies.

## Step 3: Calculate ( O – E ) 2

In a new column called “( O − E ) 2 ”, square the values in the previous column.

## Step 4: Calculate ( O − E ) 2 / E

In a final column called “(O − E) 2 / E”, divide the previous column by the expected frequencies.

## Step 5: Calculate Χ 2

Finally, add up the values of the previous column to calculate the chi-square test statistic (Χ2).

## Step 1: Calculate the expected frequencies

Use the contingency table to calculate the expected frequencies following the formula:

## Step 2: Calculate chi-square

Use the Pearson’s chi-square formula to calculate the test statistic :

## Step 3: Find the critical chi-square value

- The degrees of freedom ( df ): For a chi-square test of independence, the df is (number of variable 1 groups − 1) * (number of variable 2 groups − 1).
- Significance level (α): By convention, the significance level is usually .05.

## Step 4: Compare the chi-square value to the critical value

## Step 5: Decide whether to reject the null hypothesis

- The data allows you to reject the null hypothesis that the variables are unrelated and provides support for the alternative hypothesis that the variables are related.
- The data doesn’t allow you to reject the null hypothesis that the variables are unrelated and doesn’t provide support for the alternative hypothesis that the variables are related.

## Step 6: Follow up with post hoc tests (optional)

- Since there are two intervention groups and two outcome groups for each test, there is (2 − 1) * (2 − 1) = 1 degree of freedom.
- There are three tests, so the significance level with a Bonferroni correction applied is α = .05 / 3 = .016.
- For a test of significance at α = .016 and df = 1, the Χ 2 critical value is 5.803.
- The chi-square value is greater than the critical value for the pamphlet vs control and phone call vs. control tests.

## When to use the chi-square goodness of fit test

## When to use Fisher’s exact test

## When to use McNemar’s test

## When to use a G test

One reason to prefer chi-square tests is that they’re more familiar to researchers in most fields.

Download Word doc Download Google doc

m = matrix(data = c(89, 84, 86, 9, 8, 24), nrow = 3, ncol = 2)

## Cite this Scribbr article

Turney, S. (2022, November 10). Chi-Square Test of Independence | Formula, Guide & Examples. Scribbr. Retrieved February 28, 2023, from https://www.scribbr.com/statistics/chi-square-test-of-independence/

## Is this article helpful?

## Shaun Turney

JMP | Statistical Discovery.™ From SAS.

## Statistics Knowledge Portal

A free online introduction to statistics

## Chi-Square Test of Independence

What is the chi-square test of independence.

## When can I use the test?

You can use the test when you have counts of values for two categorical variables.

## Can I use the test if I have frequency counts in a table?

Yes. If you have only a table of values that shows frequency counts, you can use the test.

## Using the Chi-square test of independence

## What do we need?

- We have a list of movie genres; this is our first variable. Our second variable is whether or not the patrons of those genres bought snacks at the theater. Our idea (or, in statistical terms, our null hypothesis) is that the type of movie and whether or not people bought snacks are unrelated. The owner of the movie theater wants to estimate how many snacks to buy. If movie type and snack purchases are unrelated, estimating will be simpler than if the movie types impact snack sales.
- A veterinary clinic has a list of dog breeds they see as patients. The second variable is whether owners feed dry food, canned food or a mixture. Our idea is that the dog breed and types of food are unrelated. If this is true, then the clinic can order food based only on the total number of dogs, without consideration for the breeds.

- Data values that are a simple random sample from the population of interest.
- Two categorical or nominal variables. Don't use the independence test with continous variables that define the category combinations. However, the counts for the combinations of the two categorical variables will be continuous.
- For each combination of the levels of the two variables, we need at least five expected values. When we have fewer than five for any one combination, the test results are not reliable.

## Chi-square test of independence example

- We have a simple random sample of 600 people who saw a movie at our theater. We meet this requirement.
- Our variables are the movie type and whether or not snacks were purchased. Both variables are categorical. We meet this requirement.
- The last requirement is for more than five expected values for each combination of the two variables. To confirm this, we need to know the total counts for each type of movie and the total counts for whether snacks were bought or not. For now, we assume we meet this requirement and will check it later.

Here is our data summarized in a contingency table:

## Table 1: Contingency table for movie snacks data

## Finding expected counts

## Table 2: Contingency table for movie snacks data with row and column totals

$ \frac{125\times310}{600} = \frac{38,750}{600} = 65 $

## Table 3: Contingency table for movie snacks data showing actual count vs. expected count

## Performing the test

## Table 4: Preparing to calculate our test statistic

Lastly, to get our test statistic, we add the numbers in the final row for each cell:

$ 3.29 + 3.52 + 5.81 + 6.21 + 12.65 + 13.52 + 9.68 + 10.35 = 65.03 $

- We decide on the risk we are willing to take of concluding that the two variables are not independent when in fact they are. For the movie data, we had decided prior to our data collection that we are willing to take a 5% risk of saying that the two variables – Movie Type and Snack Purchase – are not independent when they really are independent. In statistics-speak, we set the significance level, α, to 0.05.
- We calculate a test statistic. As shown above, our test statistic is 65.03.
- We find the critical value from the Chi-square distribution based on our degrees of freedom and our significance level. This is the value we expect if the two variables are independent.
- The degrees of freedom depend on how many rows and how many columns we have. The degrees of freedom (df) are calculated as: $ \text{df} = (r-1)\times(c-1) $ In the formula, r is the number of rows, and c is the number of columns in our contingency table. From our example, with Movie Type as the rows and Snack Purchase as the columns, we have: $ \text{df} = (4-1)\times(2-1) = 3\times1 = 3 $ The Chi-square value with α = 0.05 and three degrees of freedom is 7.815.
- We compare the value of our test statistic (65.03) to the Chi-square value. Since 65.03 > 7.815, we reject the idea that movie type and snack purchases are independent.

## Understanding results

Let’s use graphs to understand the test and the results.

## Statistical details

Let’s look at the movie-snack data and the Chi-square test of independence using statistical terms.

$ H_0: \text{Movie Type and Snack purchases are independent} $

The alternative hypothesis is the opposite.

$ H_a: \text{Movie Type and Snack purchases are not independent} $

Before we calculate the test statistic, we find the expected counts. This is written as:

$ Σ_{ij} = \frac{R_i\times{C_j}}{N} $

We calculate the test statistic using the formula below:

$ Σ^n_{i,j=1} = \frac{(O_{ij}-E_{ij})^2}{E_{ij}} $

There are two possible results from our comparison:

- The test statistic is lower than the Chi-square value. You fail to reject the hypothesis of independence. In the movie-snack example, the theater owner can go ahead with the assumption that the type of movie a person sees has no relationship with whether or not they buy snacks.
- The test statistic is higher than the Chi-square value. You reject the hypothesis of independence. In the movie-snack example, the theater owner cannot assume that there is no relationship between the type of movie a person sees and whether or not they buy snacks.

## Understanding p-values

## Using Chi-Square Statistic in Research

Is there a significant relationship between voter intent and political party membership?

How does the Chi-Square statistic work?

## Discover How We Assist to Edit Your Dissertation Chapters

- Bring dissertation editing expertise to chapters 1-5 in timely manner.
- Track all changes, then work with you to bring about scholarly writing.
- Ongoing support to address committee feedback, reducing revisions.

The calculation of the Chi-Square statistic is quite straight-forward and intuitive:

How is the Chi-Square statistic run in SPSS and how is the output interpreted?

What are special concerns with regard to the Chi-Square statistic?

## Sociology 3112

Department of sociology, main navigation, the chi-square test for independence, learning objectives.

- Understand the characteristics of the chi-square distribution
- Carry out the chi-square test and interpret its results
- Understand the limitations of the chi-square test

## The Chi-Square Distribution

## The Chi-Square Test

Gender and Getting in Trouble at School

As is customary in the social sciences, we'll set our alpha level at 0.05

With these sets of figures, we calculate the chi-square statistic as follows:

## The Limitations of the Chi-Square Test

## Main Points

- The chi-square distribution is actually a series of distributions that vary in shape according to their degrees of freedom.
- The chi-square test is a hypothesis test designed to test for a statistically significant relationship between nominal and ordinal variables organized in a bivariate table. In other words, it tells us whether two variables are independent of one another.
- The obtained chi-square statistic essentially summarizes the difference between the frequencies actually observed in a bivariate table and the frequencies we would expect to see if there were no relationship between the two variables.
- The chi-square test is sensitive to sample size.
- The chi-square test cannot establish a causal relationship between two variables.

## Carrying out the Chi-Square Test in SPSS

- Using the World Values Survey data, run a chi-square test to determine whether there is a relationship between sex ("SEX") and marital status ("MARITAL"). Report the obtained statistic and the p-value from your output. What is your conclusion?
- Using the ADD Health data, run a chi-square test to determine whether there is a relationship between the respondent's gender ("GENDER") and his or her grade in math ("MATH"). Again, report the obtained statistic and the p-value from your output. What is your conclusion?

## 11.3 - Chi-Square Test of Independence

Again, we will be using the five step hypothesis testing procedure:

\(H_a:\) There is a relationship between the two variables in the population (they are dependent)

If \(p \leq \alpha\) reject the null hypothesis.

If \(p>\alpha\) fail to reject the null hypothesis.

Write a conclusion in terms of the original research question.

## 11.3.1 - Example: Gender and Online Learning

\(\chi^2=\sum \dfrac{(O-E)^2}{E} \)

The chi-square test statistic is 0.743

\(df=(number\;of\;rows-1)(number\;of\;columns-1)=(2-1)(2-1)=1\)

\(p>\alpha\), therefore we fail to reject the null hypothesis.

## 11.3.2 - Minitab: Test of Independence

## Minitab ® – Chi-square Test Using Raw Data

- Null hypothesis : Seat location and cheating are not related in the population.
- Alternative hypothesis : Seat location and cheating are related in the population.

To perform a chi-square test of independence in Minitab using raw data:

- Open Minitab file: class_survey.mpx
- Select Stat > Tables > Chi-Square Test for Association
- Select Raw data (categorical variables) from the dropdown.
- Choose the variable Seating to insert it into the Rows box
- Choose the variable Ever_Cheat to insert it into the Columns box
- Click the Statistics button and check the boxes Chi-square test for association and Expected cell counts
- Click OK and OK

This should result in the following output:

## Rows: Seating Columns: Ever_Cheat

## 11.3.2.1 - Example: Raw Data

Let's use Minitab to calculate the test statistic and p-value.

- After entering the data, select Stat > Tables > Cross Tabulation and Chi-Square
- Enter Dog in the Rows box
- Enter Cat in the Columns box
- Select the Chi-Square button and in the new window check the box for the Chi-square test and Expected cell counts

## Rows: Dog Columns: Cat

Since the assumption was met in step 1, we can use the Pearson chi-square test statistic.

Our p value is greater than the standard 0.05 alpha level, so we fail to reject the null hypothesis.

## 11.3.2.2 - Example: Summarized Data

Example: coffee and tea preference.

Is there a relationship between liking tea and liking coffee?

Let's use the 5 step hypothesis testing procedure to address this research question.

Assumption: All expected counts are at least 5.

- Select Stat > Tables > Cross Tabulation and Chi-Square
- Select Summarized data in a two-way table from the dropdown
- Enter the columns Likes Coffee-Yes and Likes Coffee-No in the Columns containing the table box
- For the row labels enter Likes Tea (leave the column labels blank)
- Select the Chi-Square button and check the boxes for Chi-square test and Expected cell counts .

## Rows: Likes Tea Columns: Worksheet columns

Our p value is less than the standard 0.05 alpha level, so we reject the null hypothesis.

There is evidence of a relationship between between liking coffee and liking tea in the population.

## 11.3.3 - Relative Risk

## Examples of Risk

45 out of 100 children get the flu each year. The risk is \(\frac{45}{100}=.45\) or 45%

Thus, relative risk gives the risk for group 1 as a multiple of the risk for group 2.

## Example of Relative Risk

Children are 4.5 times more likely than adults to get the flu this year.

## Stats and R

Chi-square test of independence by hand.

## Introduction

- \(H_0\) : the variables are independent, there is no relationship between the two categorical variables. Knowing the value of one variable does not help to predict the value of the other variable
- \(H_1\) : the variables are dependent, there is a relationship between the two categorical variables. Knowing the value of one variable helps to predict the value of the other variable

\[\chi^2 = \sum_{i, j} \frac{\big(O_{ij} - E_{ij}\big)^2}{E_{ij}}\]

- in the subgroup of athlete and non-smoker: \(\frac{(14 - 9)^2}{9} = 2.78\)
- in the subgroup of non-athlete and non-smoker: \(\frac{(0 - 5)^2}{5} = 5\)
- in the subgroup of athlete and smoker: \(\frac{(4 - 9)^2}{9} = 2.78\)
- in the subgroup of non-athlete and smoker: \(\frac{(10 - 5)^2}{5} = 5\)

and then we sum them all to obtain the test statistic:

\[\chi^2 = 2.78 + 5 + 2.78 + 5 = 15.56\]

\[df = (\text{number of rows} - 1) \cdot (\text{number of columns} - 1)\]

Chi-square table - Critical value for alpha = 5% and df = 1

\[\text{test statistic} = 15.56 > \text{critical value} = 3.84146\]

## Related articles

- Wilcoxon test in R: how to compare 2 groups under the non-normality assumption?
- Correlation coefficient and correlation test in R
- One-proportion and chi-square goodness of fit test
- How to do a t-test or ANOVA for more than one variable at once in R?

## Liked this post?

Yes, receive new posts by email

Consulting FAQ Contribute Sitemap

Hypothesis Testing - Chi Squared Test

Boston University School of Public Health

## Introduction

## Learning Objectives

After completing this module, the student will be able to:

- Perform chi-square tests by hand
- Appropriately interpret results of chi-square tests
- Identify the appropriate hypothesis testing procedure based on type of outcome variable and number of samples

## Tests with One Sample, Discrete Outcome

Test Statistic for Testing H 0 : p 1 = p 10 , p 2 = p 20 , ..., p k = p k0

H 0 : p 1 =0.02, p 2 =0.39, p 3 =0.36, p 4 =0.23 or equivalently

H 0 : Distribution of responses is 0.02, 0.39, 0.36, 0.23

The formula for the test statistic is:

The test statistic is computed as follows:

We presented the following approach to the test using a Z statistic.

H 0 : p 1 =0.75, p 2 =0.25 or equivalently H 0 : Distribution of responses is 0.75, 0.25

## Tests for Two or More Independent Samples, Discrete Outcome

Test Statistic for Testing H 0 : Distribution of outcome is independent of groups

Two events, A and B, are independent if P(A|B) = P(A), or equivalently, if P(A and B) = P(A) P(B).

P(Group 1 and Response Option 1) = P(Group 1) P(Response Option 1).

P(Group 1 and Response 1) = P(Group 1) P(Response 1),

P(Group 1 and Response 1) = (25/150) (62/150) = 0.069.

P(Group 2 and Response 1) = P(Group 2) P(Response 1),

P(Group 2 and Response 1) = (50/150) (62/150) = 0.138.

The expected frequency in Group 2 and Response 1 is 150(0.138) = 20.7.

Expected Cell Frequency = (Row Total * Column Total)/N.

H 0 : Living arrangement and exercise are independent

We now compute the expected frequencies using the formula,

Expected Frequency = (Row Total * Column Total)/N.

Here the new or experimental pain reliever is group 1 and the standard pain reliever is group 2.

Therefore, the sample size is adequate, so the following formula can be used:

Reject H 0 if Z < -1.960 or if Z > 1.960.

We now substitute to compute the test statistic.

We now conduct the same test using the chi-square test of independence.

H 0 : Treatment and outcome (meaningful reduction in pain) are independent

The formula for the test statistic is:

We now compute the expected frequencies using:

## Chi-Squared Tests in R

## Answer to Problem on Pancreaticoduodenectomy and Surgical Apgar Scores

H 0 : Apgar scores and patient outcome are independent of one another.

H A : Apgar scores and patient outcome are not independent.

Since 14.3 is greater than 9.49, we reject H 0.

## IMAGES

## VIDEO

## COMMENTS

Like all hypothesis tests, the chi-square test of independence evaluates a null and alternative hypothesis. The hypotheses are two competing

The Chi-square test of independence is a statistical hypothesis test used to determine whether two categorical or nominal variables are likely to be related

The null hypothesis for this test is that there is no relationship between gender and empathy. The alternative hypothesis is that there is a relationship

The Chi Square statistic is commonly used for testing relationships between categorical variables. The null hypothesis of the Chi-Square test is that no

Regarding the hypotheses to be tested, all chi-square tests have the same general null and research hypotheses. The null hypothesis states

How do we test the independence of two categorical variables? It will be done using the Chi-Square Test of Independence. As with all prior statistical tests

Recall that if two categorical variables are independent, then P ( A ) = P ( A ∣ B ) . The chi-square test of independence uses this fact to compute expected

If the difference between the observed frequencies and the expected frequencies is small, we cannot reject the null hypothesis of independence

Null hypothesis: There are no relationships between the categorical variables. If you know the value of one variable, it does not help you predict the value of

The test is called the χ2 test of independence and the null hypothesis is that there is no difference in the distribution of responses to the outcome across