drep icon indicating copy to clipboard operation
drep copied to clipboard

Inquiry on result interpretation

Open LanSabb opened this issue 9 months ago • 1 comments

Hi

I am attempting to dereplicate the reconstructed MAGs at species level using following commands

In="CheckM2 output" In2="Folder possessing MAGs"/* ; Out="Output folder" ; mkdir -p $Out ; dRep dereplicate -p 80 -comp 50 -con 10 -sa 0.965 --S_algorithm gANI -nc 0.6 --genomeInfo $In $Out -g $In2 ;

The thresholds (0.965 & 0.6) are suggested from the previous study Reference: Varghese, Neha J., et al. "Microbial species delineation using whole genome sequences." Nucleic acids research 43.14 (2015): 6761-6771.

The attached files are the dereplicated MAG and their taxonomy profile using gtdb-tk

If you see the results, there are taxonomic affiliation of the identical species

For example, both bin29 and bin51 are Nitrospira_sp009594995 having the closest relative with GCA_009594995.1

Does it mean that dereplication does not work well with this parameter ? How can I understand this result and de-replicate the MAGs at species level if this is not proper method ?

output.txt

Thanks !

LanSabb avatar Apr 09 '25 12:04 LanSabb

Hi @LanSabb - GTDB uses a very complex method to do their dereplicaiton that includes things like preserving the names of historic species on a case-by-case basis and adjusting thresholds between ~97-94.5% depending on the species in question. Because of this, it's not possible to exactly recapitulate GTDB species with dRep, but the thresholds your using will get you very close. If you'd like to do species dereplicaiton exactly like GTDB, you could always just manually pick one member of each GTDB species with the highest score

Best, Matt

MrOlm avatar Apr 09 '25 16:04 MrOlm