python-novice-inflammation icon indicating copy to clipboard operation
python-novice-inflammation copied to clipboard

The CSV example can give novices the wrong impression of how to structure CSV files

Open bast opened this issue 4 years ago • 1 comments

The CSV data used in this lesson (https://github.com/swcarpentry/python-novice-inflammation/tree/gh-pages/data) has two problems which for me are significant:

  1. The files contain no header line:
  • This means that only by looking at the data alone, we have no idea what the data represents. I think this is not good practice from the documentation/reproducibility perspective.
  • Many CSV readers understand the header line to create a dictionary or dataframe but this line is now missing.
  • I understand that the motivation to omit it is perhaps so that it can be read in with numpy.loadtxt but this is not how I would read in CSV data. Here my impression is that the data is adapted to the solution rather than adapting the solution to the data.
  1. The data is not in "tidy" format (https://en.wikipedia.org/wiki/Tidy_data):
  • When teaching data visualization (different course) I emphasize to arrange data in tidy format (columns are variables, rows are measurements) so that the data can be extended with more measurements without modifying the analysis/plotting scripts.

I find it so important to show good examples, in particular to novices because novices will often take what they see and assume that this is the way to do it and use this in their work, but for me this is not a good example. And novices may not see that this is not a good example for storing data for analysis/plotting.

Also I don't only want to point out problems but also offer to contribute to fixing this but before doing that I wanted to start a discussion first and get some feedback. It might be just me who has a problem with this.

bast avatar Nov 01 '20 11:11 bast

numpy.loadtxt can now handle headers (for instance, we can use skiprows) and we can also add comments. My guess is that is is mainly for historical reasons.

annefou avatar Nov 01 '20 11:11 annefou