Skip to content

Lesson 15: Tangible Data Merging

Lesson 15: Tangible Data Merging

Objective

You will learn how to merge two data sets and ask statistical questions about the merged data.

Vocabulary

merge

Essential Concepts

Lesson 15 Essential Concepts

You can enhance the context of a statistical problem by merging related data sets together. To merge data, each data set must have a "unique identifier" that tells you how to match up the lines of the data.

Lesson

  1. In this lesson you are going to examine the research question "Does the Personality Color test really work?" To answer this, you will examine whether the different color groups actually differ on particular beliefs or attitudes, or if these differences might just be due to chance. You are going to use the Stress/Chill data to see if there is evidence that the "colors" actually differ.

  2. Go back to the Stress/Chill Campaign Guidelines and refamiliarize yourself with the variables in the campaign. For the Personality Color survey, the variables are: birth gender, predominant color, secondary color, orange score, green score, gold score, blue score, zodiac sign, sports involvement.

  3. In your IDS Journal, write at least 4 statistical questions of interest that involve variables in both the Stress/Chill campaign AND the Personality Color survey. Here's an example: Do people whose predominant color is Gold tend to stress more than people whose predominant color is Blue?

  4. For the next activity, click on the document name to download a fillable copy of the Tangible Data Merging document (LMR_2.14). Use a different color of paper for each of the two data sets. For example, Data set 1 could be on plain white paper and Data set 2 could be on blue paper. Cut the paper by creating horizontal strips of each observation of data.

  5. The data in the Tangible Data Merging document is from 2 different data sets. Explore the data sets.

    What do you notice about both data sets?

  6. Each observation from the data sets in (LMR_2.14) is available in this slide. Click on the link to make a copy of the slide. The observations for dataset 1 are blue and the ones for dataset 2 are yellow. You will be able to move around each blue observation to "match up" or merge with each yellow observation.

  7. As an example, look at the two observations below. Those two observations might want to be merged because it makes sense that a person who is age 21 has probably graduated from high school. Go to your slide and merge each blue observation with a yellow observation.

    Birth Month Zip Code Age ID Number Favorite Movie
    January 90064 21 1742 The Notebook
    Zip Code ID Number Birth Month Siblings Education
    91331 1352 August 2 High School
  8. Notice that making guesses about a person's characteristics is not the best way to match up the data. After answering the questions below, go back to your merged data in your slide and makes changes if necessary.

    1. Which variables are the same in both data sets?

    2. Why are the variables Birth Month or Zip Code not the best way to merge?

    3. Which of the variable in both data sets is unique, meaning that no two people share the same value (e.g., birth month or zip code)?

    Rearrange the data in your slide (if needed).

  9. Answer the following in your IDS Journal:

    Why is it important to have at least one unique identifier (e.g., ID Number) for both data sets?

  10. In the next lab, you will learn to merge data sets using RStudio.

Reflection

What are the essential learnings you are taking away from this lesson?

Homework & Next Day

Collect data for one more day for the Stress/Chill campaign, either through the UCLA IDS UCLA App or via web browser at https://tools.idsucla.org

LAB 2G: Getting it Together

Complete Lab 2G before the Practicum.