ntCard
ntCard copied to clipboard
Jagged kmer coverage profiles with gzipped FASTA
We discovered inconsistencies in kmer histograms on two experimental ONT datasets between uncompressed and compressed FASTA input files*. In independent runs and testing different k values (16,18,20,22,25), two gzipped FASTA ONT (NA19240 [PRJEB29523] and NA12878 [SRR10965087]) read files yielded jagged and uninterpretable kmer profiles. Problem exacerbated at higher k vals. Issue observed with ntcard v1.1.1, v1.2.1 and v1.2.2.
NA12878 ONT FASTA
NA12878 ONT FASTA GZIPPED
====
NA19240 ONT FASTA
NA19240 ONT FASTA GZIPPED
*We have only observed this with FASTA files, not FASTQ files and only when using experimental nanopore data
Might be due to streaming in compressed multiline/single-line fasta records. Can you give this a try with ntCard v1.1.1?
yes, "Issue observed with ntcard v1.1.1, v1.2.1 and v1.2.2"
thanks. will investigate this.