Description
Bellabeat Case Study
Read through the Anderson (n.d.) case study thoroughly. Ensure you follow the links to additional analysis, evidence, graphs, and so on. Also, it is important to remember that it is far easier to find fault in others’ work than it is to do the work yourself.
There are two topics sections of topics. From the first section, choose one of the topics. In the second section, discuss all of the topics.
Section 1
What is driving this project? (What drives a data science project?) Explain.
Consider the stages presented in this course for data science projects. Connect each of these stages to the stages used in this case study. Explain each connection you make.
Consider the suggestions that the case study analyst provided.
What evidence in the analysis supports each of the suggestions? Explain.
If you feel that there is insufficient evidence to support one of the suggestions. Explain.
- Section 2
- Regarding the scatter plots used in this case study:
- Several scatter plots have many overlapping data points. Is it possible to accurately interpret a plot like this? Explain. Justify your reasoning.
The scatter plots are labeled with a value the author assigned to R2. The author later identifies this as a measure of “correlation.” Pearson’s product-moment correlation coefficient is the score most refer to as “correlation.” However, there are many formal statistical tests for correlation, so it’s always important to specify which type is used. The metric identified as R2 is not used for any of the formal statistical tests for correlation. However, in linear regression, where there is one independent and one dependent variable, the coefficient of determination (R2) is the absolute value of the score for Pearson’s product-moment correlation coefficient (the results of which is called the rho value or r). Whether using linear regression or Pearson’s product-moment correlation, there are several formal statistical assumptions that the data must meet. In particular, both of these analysis methods are extremely susceptible to the influence of outliers.
- Based on the plots and this information, what can you determine regarding the results associated with these plots and scores? Explain.
- Feel free to collect the data and conduct your own analysis. If you do, ensure that you state that you did this in your paper and indicate where you retrieved the data from. Here’s an example of how to do that, “Data used in this analysis was derived from Author (date).” If you used data from an external source but modified it (i.e., cleaning), then you might annotate it this way, “Data used in this analysis was adapted from Author (date).”
American Psychological Association. (2020). Publication manual of the American Psychological Association (7th ed.). https://doi.org/10.1037/0000165-000
Anderson, K. (2022). Bellabeat case study (Sheets, SQL, Data Studio). Kaggle. Retrieved October 1, 2022, from https://www.kaggle.com/code/kander87/bellabeat-case-study-sheets-sql-data-studio