vsearch icon indicating copy to clipboard operation
vsearch copied to clipboard

unexpected behaviour - v 2.28 sintax chooses first sequence when not classifiable

Open givdieri opened this issue 1 month ago • 4 comments

command: $ vsearch --db UNITE10.fasta --sintax refs.fasta --tabbedout refs_sintaxonomy.tsv --sintax_cutoff 0.8 --sintax_random vsearch v2.28.1_linux_x86_64, 251.2GB RAM, 96 cores

produces this output for the first 5 sequences (these are not closely related):

Laccaria amethystina ITS d:Fungi(1.00),p:Basidiomycota(0.79),c:Agaricomycetes(0.78),o:Boletales(0.73),f:Paxillaceae(0.73),g:Melanogaster(0.73),s:SH0000009.10FU(0.73)
Tomentella sublilacina ITS d:Fungi(1.00),p:Basidiomycota(0.74),c:Agaricomycetes(0.74),o:Boletales(0.48),f:Paxillaceae(0.48),g:Melanogaster(0.48),s:SH0000009.10FU(0.48)
Inocybe napipes ITS d:Fungi(1.00),p:Basidiomycota(0.90),c:Agaricomycetes(0.90),o:Boletales(0.77),f:Paxillaceae(0.77),g:Melanogaster(0.77),s:SH0000009.10FU(0.77)
Lactarius subdulcis ITS d:Fungi(1.00),p:Ascomycota(0.44),c:Agaricomycetes(0.37),o:Saccharomycetales(0.16),f:Paxillaceae(0.15),g:Melanogaster(0.15),s:SH0000009.10FU(0.15)
Russula ochroleuca ITS d:Fungi(1.00),p:Basidiomycota(0.46),c:Agaricomycetes(0.44),o:Boletales(0.33),f:Paxillaceae(0.33),g:Melanogaster(0.33),s:SH0000009.10FU(0.33)

These sequences are mapping to Melanogaster (SH00009.10FU). Maybe they are not 'classifiable' and therefore map to the first close sequence in the DB file? (Melanogaster ref is the 9th sequence in the reference file of 159189 sequences). Melanogaster is definitely not the closest match in the DB.

The previous version I used (v2.21) resulted in these sequences not being classified

givdieri avatar May 17 '24 14:05 givdieri