python-socialsci
python-socialsci copied to clipboard
data set switching in pandas section
The SAFI_results.csv dataset is used in the openrefine lesson and in much of the analysis in this lesson, but just for the first two parts of pandas, it uses the SF7577.tab dataset.
Is there a reason to switch contexts? Would it be better to use one dataset throughout?
After preparing with the social science lesson for teaching a workshop I am wondering about the same thing, this data switching is very confusing, especially since the SF577.tab data are never(?) introduced. Perhaps the entire lesson should be taught with the same data, and the same data should be used throughout the workshop which usually binds everything together nicely.
suggested solution: change episodes 8, 9, 11, 12, 14 towards using the SAFI data
(note: the SQL lesson also uses the SF577 data instead of SAFI data)
What do the curriculum advisory committee thinks about this issue? From the CAC meeting minutes: SQL lesson: "This lesson needs to be updated to use the SAFI dataset so that it is consistent with the rest of the workshop. It currently uses a database called “SN7577”. To show the advantages and power of using SQL, the data should be split into multiple tables."
@katrintirok and I taught using the SAFI data set for all of pandas and for matplotlib, should we contribute those changes?
Hi,
we are currently teaching this lesson in a workshop and also considered this. I ended up teaching Ep 8 and 9 using the SAFI dataset, and would support changing those episodes to that dataset.
However, the SN7577 dataset was well suited for episodes 11 and 12, as the tables seemed simpler to merge (less columns) than the SAFI dataset ones. I would support keeping the SN7577 dataset for Eps 11 and 12.
Best, Vini
We created subsets of SAFI for 11 and 12, they're visible in the files repo we made for learners to download exercises and data in advance. and how we used them is visible in my fork
Hi everyone, we have been teaching this lesson last week and we have also felt that dealing with multiple datasets was slightly confusing. Is there any plan to merge @brownsarahm's work on consistently using the SAFI dataset throughout the lesson in the main repository?