datasets Raw Data Accuracy & Consistency

Raw Data Accuracy & Consistency

Open vivianh2 opened this issue 5 years ago • 0 comments

In the fa2018.csv, I noticed that most of the STAT400+ courses only contain the GPA data for GR (graduate) sections, but, according to course explorer, our university also offered several UG (undergrad) sections for each STAT400+ courses.

From README, I know that "Based on analysis, courses with 20 or fewer students were excluded (the smallest course in the dataset has 21 students)." However, the number of students in UG sections are usually higher than the number of students in GR sections, so what should be excluded is the GPA of GR sections but rather those of UG sections.

Take FA18's STAT432 as an example. The actual number of enrollment of 2GR is 19 and that of 2UG is 53. However, if you do the addition from A+ to F for CRN 70222 (section 2GR), the number of students is 76. Therefore, I was wondering if there were some errors while conducting the data cleaning/integration, which caused the inconsistency of the raw data.

Apr 10 '19 23:04 vivianh2

datasets datasets copied to clipboard

Raw Data Accuracy & Consistency

datasets
datasets copied to clipboard