diamond icon indicating copy to clipboard operation
diamond copied to clipboard

Inconsistent taxonomy assignment results for the same sequences

Open XHe20 opened this issue 1 year ago • 1 comments

I used Diamond and MEGAN to assign taxonomy for my contigs.

diamond blastx -d nrdb.dmnd -q final.contigs.part_001.fa \
-o final.contigs_graham_01.daa -F 15 --range-culling -f 100 \
-t ./ --threads 32 --fast --max-target-seqs 100

daa-meganizer -i final.contigs_graham_01.daa \
-mdb megan-map-Feb2022.db --longReads

I exported taxonomy information at the Class level using MEGAN, and there were 1306 contigs assigned to Mammalia. I used those 1306 sequences to re-run above scripts and only 97.6% of the 1306 sequences were assigned to Mammalia. This is not expected as I expected 100% of the 1306 sequences were assigned to Mammalia.

Then, I set --masking 0 and run the analyses again.

diamond blastx -d nrdb.dmnd -q final.contigs.part_001.fa \
-o final.contigs_graham_01_2.daa -F 15 --range-culling -f 100 \
-t ./ --threads 32 --fast --max-target-seqs 100 --masking 0

daa-meganizer -i final.contigs_graham_01_2.daa \
-mdb megan-map-Feb2022.db --longReads

I used the contigs assigned to Mammalia to re-run the above scripts, only 82.2% sequences were assigned to Mammalia.

I am wondering what caused the inconsistency and what parameters can be used to increase the consistency for the results from different runs.

XHe20 avatar Feb 22 '24 15:02 XHe20

I'm not really sure what's happening here you would have to look at all the alignments in detail.

bbuchfink avatar Mar 04 '24 14:03 bbuchfink