shell-genomics icon indicating copy to clipboard operation
shell-genomics copied to clipboard

additional steps

Open aschuerch opened this issue 5 years ago • 0 comments

Opening this issue on behalf of @cmavian (see issue #246 )

Teaching this module, I found that in this part of the lesson:

Bad reads have a lot of N’s, so we’re going to look for NNNNNNNNNN with grep. We want the whole FASTQ record, so we’re also going to get the one line above the sequence and the two lines below. We also want to look in all the files that end with .fastq, so we’re going to use the * wildcard.

grep -B1 -A2 NNNNNNNNNN *.fastq > scripted_bad_reads.txt

We’re going to create a new file to put this command in. We’ll call it bad-reads-script.sh. The sh isn’t required, but using that extension tells us that it’s a shell script.

$ nano bad-reads-script.sh

Type your grep command into the file and save it as before. Be careful that you did not add the $ at the beginning of the line. Now comes the neat part. We can run this script. Type:

$ bash bad-reads-script.sh

It will look like nothing happened, but now if you look at scripted_bad_reads.txt, you can see that there are now reads in the file.

I think it would be helpful if after making the file "scripted_bad_reads.txt" with the command grep -B1 -A2 NNNNNNNNNN *.fastq > scripted_bad_reads.txt to show the text file and then remove it before introducing $ nano bad-reads-script.sh which will do the file again, so the students can see how the script work. alternative, call the "scripted_bad_reads2.txt" so that students see that the script worked.

aschuerch avatar Jul 18 '19 07:07 aschuerch