shell-genomics
shell-genomics copied to clipboard
Episode 03: Working with Files and Directories -- cat something other than a fastq file
The lesson demonstrates cat
by applying cat
to a fastq file. Can we cat
a different file instead? When the learners go back to working with full size fastq files, cat
ing the whole file is generally a bad idea because fastq files contain a lot of text. Perhaps we could cat ~/dc_sample_data/sra_metadata/SraRunTable.txt
?
+1
@taylorreiter I agree that cat
applied to fastq files is not a best practice example. On the other hand, doesn't it give the instructor a chance to explain why cat
is not such a good idea here? I feel that it would break the flow of the narrative to use another file at this place of the lesson.
Thanks @aschuerch. this is a good perspective.
This part of the lesson comes very early in the workshop. According to instructor training, we should treat the most useful first, so it the only utility of this line is to show them an unuseful way to look at files, I saw we omit it all together.
However, if teaching cat
is important, I think cat ~/dc_sample_data/sra_metadata/SraRunTable.txt
is a nice way to get learned used to looking at files in a terminal window as opposed to in an excel spreadsheet. However, again, they may or may not find that useful/scary/daunting.
I would be fine with omitting cat here and I am pretty sure we can do this safely within the shell lesson, however it would be good to double-check with the subsequent lessons wrangling-genomics and cloud-genomics if this doesn't break anything there.
Thanks for thinking of how this would affect later lessons @aschuerch. I've just done a search in those two lesson repos and found that cat
is used multiple times in each.
https://github.com/datacarpentry/wrangling-genomics/search?q=cat&unscoped_q=cat https://github.com/datacarpentry/cloud-genomics/search?q=cat&unscoped_q=cat
Based on this, I'm in favor of retaining an introduction to cat
in this lesson, and I like @taylorreiter's suggestion to have the learners cat
the metadata file instead of a FASTQ file. I'm happy to put in a PR for this if it would be useful. Please let me know @aschuerch
With the recent move of the 'file manipulation' part to Extra, we do not touch the metadata anymore within the regular shell lesson. We can use cat
on the metadata to demonstrate its usefulness but it would mean switching directories and introducing a new file. I would advise against it. In favor of 'teach most useful first' , how about we move head
and tail
and Details on FASTQ format
before cat
and less
?
I agree, @aschuerch. In addition, as we have the question "How can I view and search file contents" in the overview, I would also consider adding the command line "grep" in order to teach the learner another way to search content in the files. E.g: "grep SAMtools rsmodules.sh".