Skip to content

Lesson 4: The Data Cycle

Lesson 4: The Data Cycle

Objective

You will learn about the Data Cycle and its components.

Vocabulary

data cycle, statistical questions, consider data, data analysis, data interpretation

Essential Concepts

Lesson 4 Essential Concepts

A statistical investigation involves answering a research question by posing and answering statistical questions using data. The Data Cycle is a guide for carrying out a statistical investigation, and you should visit each stage of the Data Cycle at least once in an investigation. The Consider Data phase might simply involve examining a dataset to see which variables it has, and considering which of those variables can be used to answer your question. Data analysis is almost always done on the computer and consists of creating relevant graphics and numerical summaries of the data. Data Interpretation involves using the analysis to answer the statistical questions.

ATTENTION

For every single lesson:

Answer the questions in red font in your Introduction to Data Science (IDS) Journal.

Lesson

  1. During the past few lessons we have discussed what data are, how to collect and organize them, and how their values can vary. But what do we do with all this data? How can we navigate it and turn it into something useful to us?

  2. Today you will be learning about the Data Cycle. The Data Cycle is a guide we can use when learning to think about data. We typically start with asking questions. Here is a graphic of The Data Cycle:


  3. Below is an overview of each component of the Data Cycle. We will explore each component more explicitly throughout the course.

    1. Statistical Questions (Ask Questions): Statistical questions are questions that a)address variability, and b)can be answered with data.

    2. Consider Data: This is the process of observing and recording data, or of examining previously collected data to make sure it meets the needs of the investigation.

    3. Data Analysis (Analyze Data): During analysis, tables, graphs, and summaries of the data are produced to help us find patterns and relationships.

    4. Data Interpretation (Interpret Data): The statistical questions are answered by referring to the tables, graphs, and summaries made in the Data Analysis phase.

  4. To help you get a firm understanding of the Data Cycle, we will apply it to the story of a pizza delivery person in Austin, Texas. This pizza delivery person kept a blog using the name "Pizza Girl" (https://slice.seriouseats.com/2010/04/statistical-analysis-of-a-pizza-delivery-shift-20100429.html, April 29, 2010). On her blog, Pizza Girl recorded detailed notes of how she spent her time at work. While she was in the pizza shop, Pizza Girl was paid by the hour; but when she went out to make deliveries she had to clock out, and the only money she made was what she received in tips. This got Pizza Girl thinking, "How is my time most profitably spent?" To help her answer this research question, she began considering her tips.

    1. Statistical Question: Does method of payment affect my tip?

      Pizza Girl wondered if she made more tip money from deliveries to customers who prepaid, or from those that paid in cash, credit card, or with a check.

    2. Consider Data: The image below shows how Pizza Girl recorded her data. The data below represents the first 18 entires in Pizza Girl's data set. Before she can analyze the data using a computer, she needs to organize it in rows and columns.

      Organize the data the same way it would appear on a spreadsheet, with the columns Tips, Payment Method, and Time.

    3. Data Analysis: Pizza Girl made an attempt to analyze her data but wasn't completely satisfied. Coincidentally, a statistician who was searching for good pizza came across her blog and helped her with her analysis. He created the plot below. You'll learn later how to interpret these boxes, but for now it's enough to know that each box shows the variety of tips observed for each payment method. For example, the tips from those who prepaid ("Pre") ranged from $1 to just over $5, with half of the values between $2 and $4.50.

    4. From now on Pizza Girl will focus on deliveries to people who pay in cash, since those customers typically tip between $3 to $4. She’ll also consider making deliveries to customers who prepay because she might get lucky and score a big tip.

  5. Now it's your turn! To check your understanding of the Data Cycle, you will create a Data Cycle Spinner with your interpretation of each component. To create your own Data Cycle Spinner, look for supplies around your home: Cardboard or paper, markers, and something to insert through the center of your spinner to make it spin.

  6. Create your own spinner. Remember to label each component and to write or draw your interpretation of it (see example below).

  7. The images below contain examples of components of the Data Cycle (Statistical Questions, Consider Data, Data Analysis, Data Interpretation). For each image you will need to decide which part of the Data Cycle is represented. There is no right or wrong answer as long as you provide a sound justification.

    1. Where in the Data Cycle are we? Explain.

    2. Where in the Data Cycle are we? Explain.

    3. Where in the Data Cycle are we? Explain.

    4. Where in the Data Cycle are we? Explain.

    5. Where in the Data Cycle are we? Explain.

  8. Were you a little doubtful about the statement in image #7c, "Smokers typically have larger lung capacities than non-smokers"? Why or why not? What did the spreadsheet in image #7e reveal?

  9. Something to ponder: The conclusion made in image #7c, "Smokers typically have larger lung capacities than non-smokers" was a correct interpretation of the visual in #7d. However, it is important to understand on whom the data was collected. As image #7e reveals, the data included youths whose ages range from 3 to 19 years old. So yes, children usually have smaller lung capacities than adults (smokers or non-smokers). The take-away here is to carefully consider who your data represents.

  10. You can enter the Data Cycle at any stage. For example, you can be given a data set and, after considering the variables, come up with statistical questions that can be answered with the data. In the exercise below, you will enter the Data Cycle at the analysis stage because a graph has been created from the data that was collected.

  11. The Bros & Dudes Graphics handout contains 10 pairings of graphs. The graphics were created for the Quartz website by Nikhil Sonnad as a data visualization. He collected the data via Twitter. The graphics show how common certain terms are throughout the United States when referring to friends.

    Choose one of the graphics and come up with 2 questions that could be asked given the graphic you chose. Write these in your IDS Journal.

    Click on the document name to download a fillable copy of the Bros & Dudes Graphics handout (LMR_1.5).

  12. Using ONE of your questions from #11, create a Data Cycle graphic (on a regular piece of paper) to turn in. The cycle should be clearly labeled and should have appropriate responses for each of the 4 components. Some examples have been provided below:

    Reflection

    What are the essential learnings you are taking away from this lesson?