ggtree icon indicating copy to clipboard operation
ggtree copied to clipboard

missing factors in geom_barh with facet_plot(?)

Open xvazquezc opened this issue 5 years ago • 1 comments
trafficstars

Hi,

I've been trying to plot a bar chart with community composition together with a hierarchical clustering dendrogram, and while I was able to plot it, there are always some of the factors that while they still appear in the legend, they are missing from the bar chart. I found out first trying to plot 10 factors but it happens even with only 5. Files for reproducible example: example_files.zip

library(ggtree)
library(tidyverse)
library(ggstance)

dist <- read.csv("dist.csv", header = TRUE, row.names = 1)
dist1 <- as.dist(dist)
hc <- hclust(dist1)
den <- as.dendrogram(hc)
p <- ggtree(den)

# load table with data for bar chart
mytable <- read.csv("mytable.csv", header = T, row.names = 1)

#put together
facet_plot(p, data = mytable, geom = geom_barh, panel = "Phylum", colour = "black",
           mapping = aes(fill = as.factor(Phylum),x=Abundance), stat = "identity") +
  geom_tiplab(size = 2) + xlim(0, 0.38)

This is the output, which in this simplified example, misses all Aquificae (should be >50% in some samples): image

To be sure it wasn't a problem with mytable, I run it with ggplot as normal geom_bar and as geom_barh from ggstance (only showing the latter here):

ggplot(mytable, aes(y=LIMSID, x = Abundance)) +
  geom_barh(aes(fill=Phylum), position = "stack", stat = "identity")

and as you can see there are obvious differences: image

I did further testing by removing one of the factors in mytable$Phylum:

mytable2 <- mytable[mytable$Phylum != "Aquificae", ]
facet_plot(p, data = mytable2, geom = geom_barh, panel = "Phylum", colour = "black",
           mapping = aes(fill = as.factor(Phylum),x=Abundance), stat = "identity") +
  geom_tiplab(size = 2) + xlim(0, 0.38)

And now it shows 4 categories without issue: image or it seems... image

If I remove a different one, the offending "Aquificae" only shows up in a few samples where is a very minor component:

mytable3 <- mytable[mytable$Phylum != "Proteobacteria", ]
facet_plot(p, data = mytable2, geom = geom_barh, panel = "Phylum", colour = "black",
           mapping = aes(fill = Phylum, x=Abundance), stat = "identity") +

image image

I've run this on R3.6.3 and both ggtree versions 2.0.4 (from Bioconductor) and 2.3.0 (from Github). I also usually get the following message, which I wouldn't be surprised it would be related, but it doesn't matter if I remove zeroes from my table, the rows removed barely change:

Warning messages:
1: Removed 53 rows containing missing values (position_stack). 
2: Removed 82 rows containing missing values (geom_barh). 

I've also tried to ordered the data on my table based on the order from p$data$label or the order of the leaves from the hclust output, but all the same.

Can anyone reproduce this? Is there anything obvious I'm missing?

Thank you in advance, Xabi

xvazquezc avatar Apr 30 '20 03:04 xvazquezc

Since you used xlim. The range of x is 0 ~ 0.38 in your example, the x of geom_barh will be removed. To solve the problem, you should not use xlim. And you can use geom_blank to show the tiplabels. PS: you can use geom_bar of ggplot2 (new version) instead of geom_barh of ggstance by adding orientation="y".

library(ggtree)
library(tidyverse)

dist <- read.csv("dist.csv", header = TRUE, row.names = 1)
dist1 <- as.dist(dist)
hc <- hclust(dist1)
den <- as.dendrogram(hc)
######## geom_blank##############
p <- ggtree(den) + geom_blank(aes(x=0.42))

# load table with data for bar chart
mytable <- read.csv("mytable.csv", header = T, row.names = 1)

#put together
facet_plot(p, data = mytable, geom = geom_bar, panel = "Phylum", colour = "black",
           mapping = aes(fill = as.factor(Phylum),x=Abundance), stat = "identity", orientation="y") +
  geom_tiplab(size = 2) 

xiangpin avatar Jun 15 '20 09:06 xiangpin