sc2-illumina-pipeline
sc2-illumina-pipeline copied to clipboard
Unpaired reads and bad test sample
While getting single-end reads working, I noticed a strange behavior with the test sample RR057e_00734_subsampled
.
It seems like there's an issue with the mate pairing for this test sample. When I align it single end, or skip the host filtering and kraken steps, then it recovers a full genome. In the latter case (paired but without filtering), very few of the aligned reads are properly paired.
When I run the pipeline under the standard settings (paired with filtering steps), then the unpaired alignments seem to get discarded, resulting in no genome recovered.
However, the headers for the reads in the fastq seem to be properly matched.
We should do 2 things:
- Remove this weird sample and replace it with a better one for the unit test.
- Consider whether we want to keep aligned reads whose mates don't align. It seems like they are getting discarded now.