PhyloRank icon indicating copy to clipboard operation
PhyloRank copied to clipboard

Define how many new families I have

Open SilasK opened this issue 4 years ago • 9 comments

I have a bunch of MAGs which I annotated using GTDB-tk and I also build a tree based on the markers.

If I have for example 3 genomes that are not annotated at family level but belong to the same order. Can I use phylorank to show that these genomes belong to 1,2 or three new families?

SilasK avatar Mar 13 '20 09:03 SilasK

Yes. Does your tree span an entire domain? Establishing ranks requires comparing the relative evolutionary divergence (RED) between taxa at the same defined rank (i.e. it is a measure relative to your specific tree and not an absolute measure). Assuming you have this, you need to first decorate your tree and then calculate the RED values as described in the README. We are hoping to provide a better solution in the future, but this is ongoing work.

donovan-h-parks avatar Mar 13 '20 17:03 donovan-h-parks

Thank you for your reply.

Yes my tree spans several phyla. I decorated the tree and predicted outliers as described in the Readme. Can you explain me how I get the answer to my question. It is not completely clear to me.

SilasK avatar Mar 19 '20 16:03 SilasK

It requires some manual curation. You need to annotate your tree with the families you suspect. You can then inspect the output of the outlier command to see if these families have an RED value that is similar to other families. I appreciate this isn't an ideal solution for your situation, but PhyloRank isn't really meant to address this direct problem.

donovan-h-parks avatar Mar 19 '20 18:03 donovan-h-parks

Yes. Does your tree span an entire domain? Establishing ranks requires comparing the relative evolutionary divergence (RED) between taxa at the same defined rank (i.e. it is a measure relative to your specific tree and not an absolute measure). Assuming you have this, you need to first decorate your tree and then calculate the RED values as described in the README. We are hoping to provide a better solution in the future, but this is ongoing work.

Is that means, for example, if there are only 30 species in a family, but the genera classification of all these species is not clear, then we can not use phylorank to calculate RED and classify them at the genus level, because Phylorank needs some established and closely relatived genera as reference?

fujch7 avatar Apr 07 '20 17:04 fujch7

You need a sensible calibration point to determine suitable RED values for defining a genus. GTDB does this by calculating the median RED value of all well-defined bacterial or archaeal genera. Other approaches are certainly possible, but I haven't explored these in detail. Ultimately, I hope to incorporate an approach for resolving this issue into GTDB-Tk, but this is still in development.

donovan-h-parks avatar Apr 07 '20 20:04 donovan-h-parks

Hey I've ssen you updated the GTDBtk, does the https://github.com/Ecogenomics/GTDBTk/pull/244 solve this issue?

SilasK avatar Jun 03 '20 13:06 SilasK

It aims to help answer such questions though manual inspection and decision making is still required.

donovan-h-parks avatar Jun 03 '20 14:06 donovan-h-parks

Hallo, I managed to run the gtdbtk infer_ranks on the tree including the ref and my genomes. If I understand it correctly it puts the RED values on the tree.

I tried to open it with ete3 (python) but it didn’t understood the format. Could you point me to a tool to visualise an analyse the generated tree in order to do the manual curation?

SilasK avatar Dec 17 '20 13:12 SilasK

Hi. You can visualize the output using Dendroscope.

donovan-h-parks avatar Dec 17 '20 16:12 donovan-h-parks