Lab 1G: What’s the FREQ?
Lab 1G - What's the FREQ?
Directions: Follow along with the slides and answer the questions in bold font in your IDS Journal.
Clean it up!
-
In Lab 1F, we saw how we could clean data to make it easier to use and analyze.
– Using the data you cleaned, we can start analyzing a small set of variables from the American Time Use (ATU) survey.
– The process of cleaning and then analyzing data is very common in Data Science.
-
In this lab, we'll learn how we can create frequency tables to detect relationships between categorical variables.
– Use the
data()
function to load theatu_clean
data file to use in this lab.
How do we summarize categorical variables?
-
When we're dealing with categorical variables, we can't just calculate an average to describe a typical value.
– (Honestly, what's the average of categories orange, apple and banana, for instance?)
-
When trying to describe categorical variables with numbers, we calculate frequency tables.
Frequency tables?
-
When it comes to categories, about all you can do is count or tally how often each category comes up in the data.
-
Fill in the blanks below to answer the following: How many more females than males are there in our ATU data??
tally(~ ____, data = ____)
Two-way frequency tables
-
Counting the categories of a single variable is nice, but oftentimes we want to make comparisons.
-
Use a line of code that's similar to how we facet plots to tally the number of people with physical challenges, as well as their genders.
– Does one
gender
seem to have a higher occurrence of physical challenges than the other? If so, which one and explain your reasoning?
Interpreting two-way frequency tables
-
Recall that there were 1153 more women than men in our data set.
– If there are more women, then we might expect women to have more physical challenges (compared to men).
-
Instead of using counts we use percentages.
-
Include:
format = "percent"
as option to the code you used to make your 2-way frequency table. Then answer this question again:– Does one
gender
seem to have a higher occurence of physical challenges than the other? If so, which one and explain your reasoning?– Did your answer change from before? Why?
One final option
-
It's often helpful to display totals in our 2-way frequency tables.
– To include them, include
margins = TRUE
as an option in the tally function.
On your own
-
Describe what happens if you create a 2-way frequency table with a numerical variable and a categorical variable.
-
How are the types of statistical questions that 2-way frequency tables can answer different than 1-way frequency tables?
-
Which gender has a higher rate of part-time employment?
-
Does one gender socialize more than the other? Answer this question first:
– Create a subset of the ATU data that includes only people who socialized for more than 0 minutes.
– Create a
histogram
and includetype = "percent"
as an option in the function.