vsearch
vsearch copied to clipboard
unexpected behaviour - v 2.28 sintax chooses first sequence when not classifiable
command: $ vsearch --db UNITE10.fasta --sintax refs.fasta --tabbedout refs_sintaxonomy.tsv --sintax_cutoff 0.8 --sintax_random vsearch v2.28.1_linux_x86_64, 251.2GB RAM, 96 cores
produces this output for the first 5 sequences (these are not closely related):
Laccaria amethystina ITS | d:Fungi(1.00),p:Basidiomycota(0.79),c:Agaricomycetes(0.78),o:Boletales(0.73),f:Paxillaceae(0.73),g:Melanogaster(0.73),s:SH0000009.10FU(0.73) |
---|---|
Tomentella sublilacina ITS | d:Fungi(1.00),p:Basidiomycota(0.74),c:Agaricomycetes(0.74),o:Boletales(0.48),f:Paxillaceae(0.48),g:Melanogaster(0.48),s:SH0000009.10FU(0.48) |
Inocybe napipes ITS | d:Fungi(1.00),p:Basidiomycota(0.90),c:Agaricomycetes(0.90),o:Boletales(0.77),f:Paxillaceae(0.77),g:Melanogaster(0.77),s:SH0000009.10FU(0.77) |
Lactarius subdulcis ITS | d:Fungi(1.00),p:Ascomycota(0.44),c:Agaricomycetes(0.37),o:Saccharomycetales(0.16),f:Paxillaceae(0.15),g:Melanogaster(0.15),s:SH0000009.10FU(0.15) |
Russula ochroleuca ITS | d:Fungi(1.00),p:Basidiomycota(0.46),c:Agaricomycetes(0.44),o:Boletales(0.33),f:Paxillaceae(0.33),g:Melanogaster(0.33),s:SH0000009.10FU(0.33) |
These sequences are mapping to Melanogaster (SH00009.10FU). Maybe they are not 'classifiable' and therefore map to the first close sequence in the DB file? (Melanogaster ref is the 9th sequence in the reference file of 159189 sequences). Melanogaster is definitely not the closest match in the DB.
The previous version I used (v2.21) resulted in these sequences not being classified