shell-genomics icon indicating copy to clipboard operation
shell-genomics copied to clipboard

Episode 03: Working with Files and Directories -- cat something other than a fastq file

Open taylorreiter opened this issue 6 years ago • 7 comments

The lesson demonstrates cat by applying cat to a fastq file. Can we cat a different file instead? When the learners go back to working with full size fastq files, cating the whole file is generally a bad idea because fastq files contain a lot of text. Perhaps we could cat ~/dc_sample_data/sra_metadata/SraRunTable.txt?

taylorreiter avatar Jun 27 '18 21:06 taylorreiter

+1

raynamharris avatar Jun 28 '18 06:06 raynamharris

@taylorreiter I agree that cat applied to fastq files is not a best practice example. On the other hand, doesn't it give the instructor a chance to explain why cat is not such a good idea here? I feel that it would break the flow of the narrative to use another file at this place of the lesson.

aschuerch avatar Jun 28 '18 13:06 aschuerch

Thanks @aschuerch. this is a good perspective.

This part of the lesson comes very early in the workshop. According to instructor training, we should treat the most useful first, so it the only utility of this line is to show them an unuseful way to look at files, I saw we omit it all together.

However, if teaching cat is important, I think cat ~/dc_sample_data/sra_metadata/SraRunTable.txt is a nice way to get learned used to looking at files in a terminal window as opposed to in an excel spreadsheet. However, again, they may or may not find that useful/scary/daunting.

raynamharris avatar Jun 28 '18 18:06 raynamharris

I would be fine with omitting cat here and I am pretty sure we can do this safely within the shell lesson, however it would be good to double-check with the subsequent lessons wrangling-genomics and cloud-genomics if this doesn't break anything there.

aschuerch avatar Jun 29 '18 06:06 aschuerch

Thanks for thinking of how this would affect later lessons @aschuerch. I've just done a search in those two lesson repos and found that cat is used multiple times in each.

https://github.com/datacarpentry/wrangling-genomics/search?q=cat&unscoped_q=cat https://github.com/datacarpentry/cloud-genomics/search?q=cat&unscoped_q=cat

Based on this, I'm in favor of retaining an introduction to cat in this lesson, and I like @taylorreiter's suggestion to have the learners cat the metadata file instead of a FASTQ file. I'm happy to put in a PR for this if it would be useful. Please let me know @aschuerch

ErinBecker avatar Apr 10 '19 19:04 ErinBecker

With the recent move of the 'file manipulation' part to Extra, we do not touch the metadata anymore within the regular shell lesson. We can use cat on the metadata to demonstrate its usefulness but it would mean switching directories and introducing a new file. I would advise against it. In favor of 'teach most useful first' , how about we move head and tail and Details on FASTQ format before cat and less?

aschuerch avatar Apr 11 '19 08:04 aschuerch

I agree, @aschuerch. In addition, as we have the question "How can I view and search file contents" in the overview, I would also consider adding the command line "grep" in order to teach the learner another way to search content in the files. E.g: "grep SAMtools rsmodules.sh".

ValterAlmeida avatar Dec 09 '21 23:12 ValterAlmeida