ggtree
ggtree copied to clipboard
Discrepancy between plotting tree inner labels and ggtree numbering
Prerequisites
- [] Have you read Feedback and follow the guide?
- [x] make sure your are using the latest release version
- [x] read the documents
- [x] google your question/issue
Describe you issue
- [x] Make a reproducible example (e.g. 1)
- [x] your code should contain comments to describe the problem (e.g. what expected and actually happened?)
Ask in right place
- [x] for bugs or feature requests, post here (github issue)
- [ ] for questions, please post to google group
Hello to ggtree community,
I want to plot a phylogenetic tree with the results of CAFE output (gene expansion and contraction) in every node (both internals and tips). The total output from CAFE. After plotting it I observed that the correspondence of nodes and Cafe values is right, however ggtree numbers the nodes not based on my newick format but in a way that I can not understand. To be more clear let me give you an example. I have two files: one newick format file which is my tree practically and one tab separated file which is the summary statistics of CAFE output that I use as a check of the plotted tree. In the tab separated file e.g. Node 58 corresponds to species A (so it must be tip in the plotted tree) and has values 0 (expansion) and 67 (contraction). When I plot through ggtree the corresponding newick file node 57 has these values but node 57 is not corresponding in A but in an irrelevant internal node. Instead A corresponds to let's say node 3 with the right CAFE values of node 3. So the problem is that ggtree numbers the nodes in a way that I have not manage to clarify and of course as a result in the tree we have the information not right distributed. Could you solve my question about nodes numbering or could you suggest me another way to handle my CAFE output?
My commands are:
data <- read.table("CAFE_summary.txt", header =TRUE)
and it seems like this:
######## node rait1 trait2 trait3 53 6916 6200 3284 17 0 316 296 13 3798 7127 2069 28 0 1662 1662 37 9924 596 9141 #########
p2 %<+% data + geom_label(aes(label = trait1))
Your help will be valuable.
- [ ] Make a reproducible example (e.g. 1)
library(ggtree)
library(treeio)
library(tidytree)
data <- read.table("summary_node.txt", header =TRUE)
tree<-read.tree("tree.nwk")
p2a <- ggtree(tree) + geom_tiplab(align=TRUE, linetype='dashed', linesize=.3, offset=0.02) + geom_label2(aes(label=node), size=2, color="darkred", alpha=1.5)
p2a %<+% data + geom_label(aes(label = fake_trait))
So, for every branch I want to plot trait1 and trait2 (above them) and for tips also
I think your issue is with the assumption that there is a standard for node numbering in the Newick format, which there is not. It is any software's choice on how to number nodes. ggtree uses the format of the R package ape to store trees and it numbers nodes in the following manner:
-
Tips are given node numbers 1..n (n being the number of tips) in the order that they appear in the Newick file and
-
Internal nodes are given node numbers n+1..m (m being the total number of nodes) again in the order they appear in the Newick file (i.e in the order their '(' appear). This will give the root the node number n+1.
You should check how the software CAFE numbers nodes so that you can link up the data to the tree in ggtree correctly.
by looking at the sample file, I think there is nothing I can do about this.