KMC
KMC copied to clipboard
consistency check of the input format
Hello,
Thank you for developing kmc.
I ran into an issue today before I realized I was using the wrong flag. When using -fa instead of -fm, kmc (v3.2.1) runs smoothly but obviously produces incorrect results.
Here is an example on a multiline fasta:
-fa:
Total : 14.8039s
Stats:
No. of k-mers below min. threshold : 0
No. of k-mers above max. threshold : 0
No. of unique k-mers : 45
No. of unique counted k-mers : 45
Total no. of k-mers : 45
Total no. of reads : 1
Total no. of super-k-mers : 7
-fm:
Total : 31.5231s
Stats:
No. of k-mers below min. threshold : 0
No. of k-mers above max. threshold : 0
No. of unique k-mers : 691228814
No. of unique counted k-mers : 691228814
Total no. of k-mers : 2136937309
Total no. of sequences : 28120374
Total no. of super-k-mers : 255510539
I think it could be useful to add a quick consistency check.
Best, Téo