seqtk
seqtk copied to clipboard
Request: Add seqtk shuffle command to randomise order of reads
I have been creating mock community samples using seqtk sample
on some single isolate inputs, something like this:
rm -rf tempR1.fastq tempR2.fastq
for sample in A B C; do
seqtk sample -s 123 input${sample}_R1.fastq.gz 10000 >> tempR1.fastq
seqtk sample -s 123 input${sample}_R2.fastq.gz 10000 >> tempR2.fastq
done
gzip tempR1.fastq
gzip tempR2.fastq
In this example my combined FASTQ files will have the reads from sample A, then sample B, and finally sample C - and this ordering may introduce biases in the downstream analysis.
What I would like to do is finish with something like this:
seqtk shuffle -s 123 tempR1.fastq | gzip > mixed_R1.fastq.gz
seqtk shuffle -s 123 tempR2.fastq | gzip > mixed_R2.fastq.gz
Here I am assuming -s
would set the random number seed as used in seqtk sample
to ensure that both R1 and R2 are randomised in the same way, and the output remains nicely paired.
@peterjc Until this is implemented, you can use seqkit shuffle