ntCard icon indicating copy to clipboard operation
ntCard copied to clipboard

Jagged kmer coverage profiles with gzipped FASTA

Open warrenlr opened this issue 4 years ago • 3 comments

We discovered inconsistencies in kmer histograms on two experimental ONT datasets between uncompressed and compressed FASTA input files*. In independent runs and testing different k values (16,18,20,22,25), two gzipped FASTA ONT (NA19240 [PRJEB29523] and NA12878 [SRR10965087]) read files yielded jagged and uninterpretable kmer profiles. Problem exacerbated at higher k vals. Issue observed with ntcard v1.1.1, v1.2.1 and v1.2.2.

NA12878 ONT FASTA HG12878_FASTAlog10

NA12878 ONT FASTA GZIPPED HG12878_GZFASTA_log10

====

NA19240 ONT FASTA NA19240log10FASTAuncompressed

NA19240 ONT FASTA GZIPPED NA19240log10FASTAcompressed

*We have only observed this with FASTA files, not FASTQ files and only when using experimental nanopore data

warrenlr avatar Dec 17 '20 17:12 warrenlr

Might be due to streaming in compressed multiline/single-line fasta records. Can you give this a try with ntCard v1.1.1?

hmohamadi avatar Dec 18 '20 19:12 hmohamadi

yes, "Issue observed with ntcard v1.1.1, v1.2.1 and v1.2.2"

warrenlr avatar Dec 18 '20 19:12 warrenlr

thanks. will investigate this.

hmohamadi avatar Dec 18 '20 19:12 hmohamadi