15th September 2023

The initial stage in data analysis is to ensure the linkage of precise and unambiguous data. We used Python Numpy to execute critical statistical operations such as computing medians, means, and standard deviations using data from a project sheet on diabetics, inactivity, and obesity. These computations provide us with a rudimentary understanding of the dataset.

The primary goal we had was to show a correlation between the percentage of people with diabetes and the relation of those who are sedentary. To achieve this, we designed a scatter graph in which each region is a data point. This visual aid was very helpful in assessing the relationship between these two variables. The R-squred values, a statistic that measures the strength of this relationship, were then computed using the scatter graph.

Today’s class, professor addressed many queries by the students about  the dataset, which helped me understand the upcoming phases of analysis. The lecturer pointed out that for this dataset, non-linear models are able to be applied, that might lead to a high R-squared value. To my query on his proposal on applying changes  to the variables, the instructor provided an example that for datasets which has a highly skewed distribution, a log transformation could aid to get them regularly distributed. meanwhile our dataset is almost normally distributed, he recommended against implementing modifications on the parameters in this dataset.

 

Leave a Reply

Your email address will not be published. Required fields are marked *