Lesson 5: Human Boxplots
Lesson 5: Container Boxplots
Objective
You will learn how and when to use boxplots to compare groups of data. You will also learn how to compute and interpret another measure of spread: the IQR.
Vocabulary
boxplot, quartiles, first quartile (Q1), third quartile (Q3), quantiles, minimum, maximum, five-number summary, range, interquartile range (IQR)
Essential Concepts
Lesson 5 Essential Concepts
A common statistical question is “How does this group compare to that group?” This is a hard question to answer when the groups have lots of variability. One approach is to compare the centers, spreads, and shapes of the distributions. Boxplots are a useful way of comparing distributions from different groups when all of the distributions are unimodal (one hump).
Lesson
-
We have been using the following numerical and graphical summaries to look at data:
-
Measures of center – mean, median
-
Measures of spread – MAD
-
Graphing – dotplots, histograms
-
-
All of these tools help us describe data to someone who may not actually be viewing the data set. Today we will explore another way to summarize and describe data to others with the use of another type of statistical plot that involves breaking data up into distinct pieces: a boxplot.
-
For the next activity you will need your IDS Journal, a pen, poster paper (or the largest piece of paper you can find), and some empty wall/floor space.
-
Go to your kitchen area at home and gather food containers (plastic, canned or glass bottles) of different sizes and heights. The containers do not have to be empty, nor do they have to be strictly food containers, so check your refrigerator, cabinets, under the sink, or other areas of the house (e.g., the laundry room) if necessary. You will need at least 19 of them.
-
If you have room, line up all of the containers that you have gathered on the floor next to an empty wall. Just by looking at the containers, do you think you can tell which of them represents the typical height of food containers in your home? How would you be able to tell?
-
Do a quick write in your IDS Journal about how you might organize the containers to get a better sense of their range in sizes.
-
Line the containers up in height order from shortest to tallest against the wall.
-
Once you have arranged them, how do you describe the distribution of heights? Is there any trend that you are observing?
-
Now split your distribution into two groups, one half that is taller and one half that is shorter, and decide which container represents the median height.
-
Next, tape a large piece of paper - poster paper would be ideal if you have some - horizontally (in "portrait" view) onto the wall, a few inches higher than your tallest container. Place the "median" container next to the wall directly in front of the paper. You will be creating a plot using lines drawn at certain heights.
-
Draw a horizontal line on the poster paper to mark the location of the median. Be sure to label this point as the median and include the container's actual height, in centimeters.
-
Next, split the two halves again so there are now four groups of containers. You will now find the median of each half.
-
Using the container that represents the median of the shorter half, draw another horizontal line on the paper marking its height. You should place the container in the same spot as the container that represented the median so that the line for this container is drawn underneath the median line. Be sure to label this point as the first quartile (or Q1) and include the container's actual height, in centimeters.
-
Using the container that represents the median of the taller half, draw another horizontal line on the paper marking its height. The container should stand in the same spot as the container that represented the median so that the line for this container is drawn above the median line. Be sure to label this point as the third quartile (or Q3) and include the container’s actual height, in centimeters.
-
The breaks between each group are called quartiles because they break the data into four groups (quartile comes from the Latin word quartus, which is also the root of the Spanish word cuatro). The lower break represents the first quartile (because 25% of the containers are shorter than the container at this break), and the upper break represents the third quartile (because 75% of the class is shorter than the container at this break). Another term that can be used in place of percentiles is quantiles, because this represents the quantity of data that is lower than that value.
-
Finally, put the tallest and shortest containers in front of the paper and draw horizontal lines at their heights. The shortest container represents the minimum height of the containers in your kitchen, and the tallest represents the maximum height. Be sure to label the points as the minimum and maximum, and include the containers’ actual heights, in centimeters. The video below will show you what to do.
-
When you finish, you should have five lines which represent the five-number summary: minimum, first quartile, median, third quartile, and maximum. Draw a box using the first and third quartiles as the edges of the box. The median line will be contained within the box. Extend a line from the first quartile down to the minimum, and extend a line from the third quartile up to the maximum. Your boxplot should look similar to the following:
-
Sketch the boxplot in your IDS Journal, with the appropriate labels.
-
The difference between the largest and smallest heights - or the maximum minus the minimum - is the range of the data set. Answer the following questions in your IDS Journal using the boxplot that you just created:
-
In inches, what is the difference between the largest and smallest heights? Is there a large difference between the tallest and shortest container?
-
In inches, what is the difference between the quartiles Q1 and Q3? We know that 100 percent of the distribution - the containers - falls within the minumum and maximum heights. What percent of your distribution falls within these two values?
When you calculate Q3 – Q1, fifty percent (half) of the containers fall between these two height values. This difference is known as the interquartile range (or IQR).
-
-
You learned about one measure of spread (the MAD) during the previous lesson, and you now have another measure of spread – the IQR. In your IDS Journal, answer the following questions:
-
What does it mean when the IQR is small? Circle one of the underlined words below:
The middle 50% of heights are close to/spread out from each other.
-
What does it mean when the IQR is large? Circle one of the underlined words below:
The middle 50% of heights are close to/spread out from each other.
-
-
Finally, subset the distribution into canned and plastic/glass. You will create a boxplot of heights for each on a piece of paper using the techniques you just learned in today's session.
-
In your IDS Journal:
- Sketch the plastics boxplot and the glass boxplot. Make sure each is clearly labeled with the five-number summary: minimum, Q1, Median (Q2), Q3, maximum.
- Write about the similarities and differences between the plots.
- Compare each boxplot to the overall combined boxplot of heights that you created earlier.
- Calculate the IQR for both plots.
- What does the IQR tell us about the heights of the plastic containers? What does it tell us about the glass containers?
Reflection
What are the essential learnings you are taking away from this lesson?
Homework
Complete the Ages of Oscar Winners handout.
Click on the document name to download a fillable copy of the Ages of Oscar Winners handout (LMR_2.7).