Lab 2I - R’s Normal Distribution Alphabet
Lab 2I - R's Normal Distribution Alphabet
Directions: Follow along with the slides and answer the questions in bold font in your journal.
Where we're headed
-
In the last lab, you were able to overlay a normal curve on histograms of data to help you decide if the data's distribution is close to a normal distribution.
– We also saw that calculating the
meanof random shuffles also produces differences that are normally distributed. -
In this lab, we'll learn how to use some other
Rfunctions to:– Simulate random draws from a normal distribution.
– Calculate probabilities with normal distributions.
Get set up
-
Start by loading the
titanicdata and calculate themeanageof people in the data butshuffletheirsurvivalstatus 500 times.–
Assignthis data the nameshfls. -
After creating
shfls, usemutateto add a new variable to the data set. This new variable should have the namediffand should be theageof those who survived minus those who died. -
Finally, calculate the
meanandsdof thediffvariable.–
Assignthese values the namediff_meananddiff_sd.
Is it normal?
-
Before we proceed, we need to verify that our
diffvariable looks approximately normally distributed.– Is the distribution close to normal? Explain how you determined this. Describe the center and spread of the distribution.
– Compute the mean difference in the age of the actual survivors and the actual non-survivors.
Using the normal model
-
Since the distribution of our
diffvariable appears normally distributed, we can use a normal model to estimate the probability of seeing differences that are more extreme than our actual data. -
Fill in the blanks to calculate the probability of an even smaller difference occurring than our actual difference using a normal model.
pnorm(____, mean = diff_mean, sd = ____)
Extreme probabilities
-
The probability you calculated in the previous slide is an estimate for how often we expect to see a difference smaller than the actual one we observed, by chance alone.
– Draw a sketch of a normal curve. Label the mean age difference, based on your shuffles, and the actual age difference of suvivors minus non-survivors from the actual data. Then, shade in the areas, under normal the curve, that are smaller than the actual difference.
-
If you wanted to instead calculate the probability that the difference would be larger than the one observed, we could run (fill in the blanks):
1 - pnorm(____, mean = diff_mean, sd = ____)
Simulating normal draws
-
We can simulate random draws from a normal distribution with the
rnormfunction.– Fill in the blanks in the following two lines of code to simulate 100 heights of randomly chosen men. Assume the
meanheight is 67 inches and thestandard deviationis 3 inches.– Plot your simulated heights with a
histogram.draws <- rnorm(____, mean = ____, sd = ____) histogram(draws, fit = ____)
P's and Q's
-
We've seen that we can use
pnormto calculate probabilities based on a specified quantity.– Hence, why we call it "P" norm.
-
Now we'll see how to do the opposite. That is, calculate the quantity for a specific probability.
– Hence why we'll call this a "Q" norm.
-
How tall can you be and still be in the shortest 25% of heights if the mean height is 67 inches with a standard deviation of 3 inches?
qnorm(____, mean = ____, sd = ____)
On your own
-
Using the
titanicdata, answer the following statistical question:– Were women on the Titanic typically younger than men?
– Use a histogram, 500 random shuffles and a normal model to answer the question in the bullet above.