KMC icon indicating copy to clipboard operation
KMC copied to clipboard

consistency check of the input format

Open tlemane opened this issue 2 years ago • 3 comments

Hello,

Thank you for developing kmc.

I ran into an issue today before I realized I was using the wrong flag. When using -fa instead of -fm, kmc (v3.2.1) runs smoothly but obviously produces incorrect results.

Here is an example on a multiline fasta:

-fa:

Total    : 14.8039s
Stats:
   No. of k-mers below min. threshold :            0
   No. of k-mers above max. threshold :            0
   No. of unique k-mers               :           45
   No. of unique counted k-mers       :           45
   Total no. of k-mers                :           45
   Total no. of reads                 :            1
   Total no. of super-k-mers          :            7

-fm:

Total    : 31.5231s
Stats:
   No. of k-mers below min. threshold :            0
   No. of k-mers above max. threshold :            0
   No. of unique k-mers               :    691228814
   No. of unique counted k-mers       :    691228814
   Total no. of k-mers                :   2136937309
   Total no. of sequences             :     28120374
   Total no. of super-k-mers          :    255510539

I think it could be useful to add a quick consistency check.

Best, Téo

tlemane avatar Feb 11 '22 16:02 tlemane