fqtk
fqtk copied to clipboard
Demultiplexing "N"-barcode as no-op
With https://github.com/fulcrumgenomics/fqtk/pull/30 (and release v0.3.0) fqtk allows Ns in barcodes.
I tried to run demultiplexing accepting any sequence for a sample (with a barcode containing only Ns), but all reads are written to unmatched.R1.fq.gz instead of the sample fq.
Is this intended?
@mschubert would you be willing to share some of your data, or one FASTQ record that should have been matched to a sample along with the expected barcode?
My apologies for the late reply: The issue seems to be with fastq records that contain Ns themselves:
# my.fq.gz
@M00872:1070:000000000-GLPWM:1:1101:15776:1330 1:Y:0:1
AAGANNATNGNNGNNANNNTNNNAACGTAGTGCGCCAGCCTATTTCAGTGCTCAATCTTGCAGAGAATACTCTTGAGAGCG
+
AA1A##>>#>##A##A###A###ABBFFFHGGHEGGGGGGHHFHHHHHHHHHGFHHHHHHHHHHHCGHHFHHHGHHHHHHE
@M00872:1070:000000000-GLPWM:1:1101:15866:1331 1:Y:0:1
AAGANNATNGNNGNNANNNTNNNAACGTAGTGCGCATAAGCCGTTCAAGAGGAGCCATTGTGGGGAGGCCCTGGGGACTGG
+
AAAA##>>#>##A##A###A###BABFFHHGGHEEEEGHFFHGEEGHGFHEHHEHGFHHHFGFC>FCGGCEHHHGGAEFG/
# meta.tsv
sample_id barcode
test NNNNNNN
fqtk demux --inputs my.fq.gz --max-mismatches 0 --read-structures 7B+T --sample-metadata meta.tsv --output out
Thank-you @mschubert for the clear report!