Mash
Mash copied to clipboard
screen: spurious "no valid k-mers" warning for amino acid sketches
Results seem to be okay, so there clearly are valid k-mers. This appears to be only an issue with estimating the set size of the k-mer pool, and an inaccurate warning resulting from it.
+1, just saw this with a set of 2,408 OrthoDB sequences against some MinION data:
$ mash sketch -a -i gene.orthodb.fasta
$ mash screen -p 30 -w gene.orthodb.fasta.msh reads.fastq.gz > reads.gene.mash.tsv
Loading gene.orthodb.fasta.msh...
281171 distinct hashes.
Translating from 16 inputs...
Estimated distinct (translated) k-mers in pool: 0
WARNING: no valid k-mers in input.
Summing shared...
Reallocating to winners...
Computing coverage medians...
Writing output...
But output reports hits for 1315 of the protein sequences, and seem fairly congruent with whole genome searches.