seurat icon indicating copy to clipboard operation
seurat copied to clipboard

VlnPlot removes violins below the threshold from the graphical output

Open vkavaka opened this issue 2 years ago • 9 comments

Dear Seurat team,

by exploring some genes that look quite specific on the VlnPlots we noticed, that by looking through ridges in some cases the violins are deleted in the graphical output if they are below the threshold. Here is the example of the Violin with standard VlnPlot function: Screenshot 2022-03-18 at 11 52 41 Here is the output by plotting the same gene with ggplot2 geometrical violins. As you see, the violins in groups 1 and 4 look the same, but 2 and 3 appear. Screenshot 2022-03-18 at 11 52 06 Why does the VlnPlot cutoff the 2 groups in the middle? What do you think about this possible misleading in the visualization?

vkavaka avatar Mar 18 '22 11:03 vkavaka

Hi @vkavaka Could you post a reproducible example for this VlnPlot issue? You may use pbmc_smallor any dataset in SeuratData or any public data. Thanks.

yuhanH avatar Mar 18 '22 19:03 yuhanH

Dear @yuhanH, thank you for your prompt reply. We created the reproducible example using the pbmc3k dataset. Here is the code:

pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")
pbmc <- subset(pbmc, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5)
pbmc <- NormalizeData(pbmc)
pbmc <- FindVariableFeatures(pbmc, selection.method = "vst", nfeatures = 2000)
all.genes <- rownames(pbmc)
pbmc <- ScaleData(pbmc, features = all.genes)
pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc))
pbmc <- FindNeighbors(pbmc, dims = 1:10)
pbmc <- FindClusters(pbmc, resolution = 0.5)
VlnPlot(pbmc, "NKG7", pt.size=0)
vln_df = data.frame(NKG7 = pbmc[["RNA"]]@data["NKG7",], cluster = pbmc$seurat_clusters)
ggplot(vln_df, aes(x = cluster, y = NKG7)) + geom_violin(aes(fill = cluster), trim=TRUE, scale = "width")

Here is the Violin using the VlnPlot: Screenshot 2022-03-20 at 11 43 28 Same with the ggplot2 (as you can see the violins below the cutoff start to appear): Screenshot 2022-03-20 at 11 43 33

Session info: R version 4.1.2 (2021-11-01) ggplot2_3.3.5 SeuratData_0.2.1 SeuratObject_4.0.4 Seurat_4.0.6

vkavaka avatar Mar 20 '22 10:03 vkavaka

@yuhanH as a possible reason: we suggest it might be the noising build in the VlnPlot function leading to removing the violins in the graphical output. Would be very happy to read your opinion on that behalf

vkavaka avatar Mar 21 '22 10:03 vkavaka

Dear @yuhanH , do you have any updates on that behalf? In our opinion, the issue is quite important and possibly leading to the misinterpretation of the "specific looking" results

vkavaka avatar Mar 24 '22 10:03 vkavaka

hi @vkavaka Thanks for showing this reproducible example. I agree with you that the change of the violin plots is related to the noise.

vln_df = data.frame(NKG7 = pbmc[["RNA"]]@data["NKG7",], cluster = pbmc$seurat_clusters)
noise <- rnorm(n = length(x =vln_df$NKG7)) / 100000
vln_df$NKG7.noise <- vln_df$NKG7  + noise
ggplot(vln_df, aes(x = cluster, y = NKG7)) + geom_violin(aes(fill = cluster), trim=TRUE, scale = "width")  
ggplot(vln_df, aes(x = cluster, y = NKG7.noise)) + geom_violin(aes(fill = cluster), trim=TRUE, scale = "width")  

image

You can also see that the noise is very small and it mainly just introduce very small variation for the data.
image Not sure why it effectively affects Violin shapes. It seems to be an issue related to geom_violin. But I also agree that it may lead to the misinterpretation of the specific looking results. It suggests that you would better keep showing the data points in the violin plot. image

yuhanH avatar Mar 24 '22 18:03 yuhanH

@yuhanH thank you for your reply and suggestion. Would you consider still keeping the noise in the vlnplot function? The only clusters that are affected seem to be the ones with the lower expression, the higher ones are completely unchanged.

Not very sure whether showing the cell points is the best way to overcome this bias, especially with a lot of cells in the object. We noticed, that after a certain point you cannot lower the size of the dots with pt.size argument of the VlnPlot. Any ideas on how to overcome this limitation and print the dots even smaller?

vkavaka avatar Mar 24 '22 18:03 vkavaka

Right. When the number of cells is big, you may consider changing the alpha value for the points. For example:

p0 <- VlnPlot(pbmc, "NKG7")
p1 <- VlnPlot(pbmc, "NKG7")
p1$layers[[2]]$aes_params$alpha <- 0.1
p0+p1

We will add this alpha value parameter into VlnPlot soon. image

yuhanH avatar Mar 24 '22 19:03 yuhanH

@yuhanH thank you for the hint with the alpha values. And what do you think about the noise? I understand, that the developers wouldn't add it up if it would not be necessary. But as you can see in this example, it may affect the data visualization. Is there any explanation, why the noise should be kept and used further?

vkavaka avatar Mar 24 '22 19:03 vkavaka

Hi @vkavaka The distribution of low expression values in the original data appears to be less fitting with the dots in the plot. For now, we retain this noise. However, we remain open to reconsidering and possibly removing it if there are clear biases emerge as a consequence of this noise.

yuhanH avatar Apr 22 '22 18:04 yuhanH

hi

yuhanH avatar Jul 06 '23 20:07 yuhanH