scater icon indicating copy to clipboard operation
scater copied to clipboard

fill_by for violin plots

Open alanocallaghan opened this issue 2 years ago • 7 comments

See discussion in #174

How should this handle positions? eg these last examples seem variously sub-optimal

library("scater")
example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)
colData(example_sce) <- cbind(colData(example_sce), perCellQCMetrics(example_sce))
plotColData(example_sce, y = "detected", x = "Cell_Cycle")

plotColData(example_sce, y = "detected", x = "Cell_Cycle", colour_by = "Cell_Cycle")

plotColData(example_sce, y = "detected", x = "Cell_Cycle", colour_by = "Cell_Cycle", fill_by = "Cell_Cycle")

plotColData(example_sce, y = "detected", x = "Cell_Cycle", point_fun = function(...) list(), fill_by="Mutation_Status")

plotColData(example_sce, y = "detected", x = "Cell_Cycle", fill_by="Mutation_Status")

plotColData(example_sce, y = "detected", x = "Cell_Cycle", fill_by="Mutation_Status", colour_by = "Mutation_Status")

plotColData(example_sce, y = "detected", x = "Cell_Cycle", fill_by="Cell_Cycle", colour_by = "Mutation_Status")

plotColData(example_sce, y = "detected", x = "Cell_Cycle", colour_by="Cell_Cycle", fill_by = "Mutation_Status")

alanocallaghan avatar Nov 03 '22 13:11 alanocallaghan

See branch fill-by

alanocallaghan avatar Nov 03 '22 16:11 alanocallaghan

That's perfect!! Thanks a lot for the nice examples!

kikegoni avatar Nov 04 '22 09:11 kikegoni

Just as a suggestion (I can adapt it from your code), it would be great to add an option to modify the alpha = 0.2 parameter of the fill_byhere:

plot_out <- plot_out + do.call(geom_violin, c(viol_args, list(colour = "gray60", alpha = 0.2, scale = "width", width = 0.8)))

kikegoni avatar Nov 04 '22 09:11 kikegoni

I don't like the behaviour shown here except when the fill_by, x, and colour_by arguments all match. However dodging the jittered points means setting the dodge and jitter width to be similar to the violin plots, and choosing how to group points (probably the same as fill_by). That would I guess mean also exposing a group_by arg and dodge_width, jitter_width...

alanocallaghan avatar Nov 04 '22 15:11 alanocallaghan

Hi, developer

I find it seems that fill_by will report a error

library("scater")
example_sce <- mockSCE()
example_sce <- logNormCounts(example_sce)
colData(example_sce) <- cbind(colData(example_sce), perCellQCMetrics(example_sce))
> plotColData(example_sce, y = "detected", x = "Cell_Cycle", colour_by = "Cell_Cycle", fill_by = "Mutation_Status")
Error:
! Problem while computing aesthetics.
i Error occurred in the 1st layer.
Caused by error in `.data[["Mutation_Status"]]`:
! Column `Mutation_Status` not found in `.data`.
Run `rlang::last_trace()` to see where the error occurred.
> rlang::last_trace()
<error/rlang_error>
Error:
! Problem while computing aesthetics.
i Error occurred in the 1st layer.
Caused by error in `.data[["Mutation_Status"]]`:
! Column `Mutation_Status` not found in `.data`.
---
Backtrace:
     x
  1. +-base (local) `<fn>`(x)
  2. +-ggplot2:::print.ggplot(x)
  3. | +-ggplot2::ggplot_build(x)
  4. | \-ggplot2:::ggplot_build.ggplot(x)
  5. |   \-ggplot2:::by_layer(...)
  6. |     +-rlang::try_fetch(...)
  7. |     | +-base::tryCatch(...)
  8. |     | | \-base (local) tryCatchList(expr, classes, parentenv, handlers)
  9. |     | |   \-base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
 10. |     | |     \-base (local) doTryCatch(return(expr), name, parentenv, handler)
 11. |     | \-base::withCallingHandlers(...)
 12. |     \-ggplot2 (local) f(l = layers[[i]], d = data[[i]])
 13. |       \-l$compute_aesthetics(d, plot)
 14. |         \-ggplot2 (local) compute_aesthetics(..., self = self)
 15. |           \-ggplot2:::scales_add_defaults(...)
 16. |             \-base::lapply(aesthetics[new_aesthetics], eval_tidy, data = data)
 17. |               \-rlang (local) FUN(X[[i]], ...)
 18. +-Mutation_Status
 19. +-rlang:::`[[.rlang_data_pronoun`(.data, "Mutation_Status")
 20. | \-rlang:::data_pronoun_get(...)
 21. \-rlang:::abort_data_pronoun(x, call = y)
> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Simplified)_China.936 
[2] LC_CTYPE=Chinese (Simplified)_China.936   
[3] LC_MONETARY=Chinese (Simplified)_China.936
[4] LC_NUMERIC=C                              
[5] LC_TIME=Chinese (Simplified)_China.936    

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] scater_1.27.9               ggplot2_3.4.2              
 [3] scuttle_1.4.0               SingleCellExperiment_1.16.0
 [5] SummarizedExperiment_1.24.0 Biobase_2.54.0             
 [7] GenomicRanges_1.46.1        GenomeInfoDb_1.30.1        
 [9] IRanges_2.28.0              S4Vectors_0.32.4           
[11] BiocGenerics_0.40.0         MatrixGenerics_1.6.0       
[13] matrixStats_0.63.0          devtools_2.4.5             
[15] usethis_2.1.6              

loaded via a namespace (and not attached):
 [1] bitops_1.0-7              fs_1.5.2                  tools_4.1.0              
 [4] profvis_0.3.7             utf8_1.2.3                R6_2.5.1                 
 [7] irlba_2.3.5.1             vipor_0.4.5               DBI_1.1.3                
[10] colorspace_2.1-0          urlchecker_1.0.1          withr_2.5.0              
[13] gridExtra_2.3             tidyselect_1.1.2          prettyunits_1.1.1        
[16] processx_3.7.0            compiler_4.1.0            cli_3.4.1                
[19] BiocNeighbors_1.12.0      DelayedArray_0.20.0       scales_1.2.1             
[22] callr_3.7.2               stringr_1.4.1             digest_0.6.29            
[25] XVector_0.34.0            pkgconfig_2.0.3           htmltools_0.5.3          
[28] sessioninfo_1.2.2         sparseMatrixStats_1.6.0   fastmap_1.1.0            
[31] htmlwidgets_1.5.4         rlang_1.1.1               rstudioapi_0.13          
[34] shiny_1.7.2               DelayedMatrixStats_1.16.0 generics_0.1.3           
[37] BiocParallel_1.28.3       dplyr_1.0.9               RCurl_1.98-1.12          
[40] magrittr_2.0.3            BiocSingular_1.10.0       GenomeInfoDbData_1.2.7   
[43] Matrix_1.3-4              Rcpp_1.0.10               ggbeeswarm_0.7.2         
[46] munsell_0.5.0             fansi_1.0.4               viridis_0.6.3            
[49] lifecycle_1.0.3           stringi_1.7.8             zlibbioc_1.40.0          
[52] pkgbuild_1.4.0            grid_4.1.0                parallel_4.1.0           
[55] promises_1.2.0.1          ggrepel_0.9.3             crayon_1.5.1             
[58] miniUI_0.1.1.1            lattice_0.20-45           cowplot_1.1.1            
[61] beachmat_2.10.0           ps_1.6.0                  pillar_1.9.0             
[64] ScaledMatrix_1.2.0        pkgload_1.3.0             glue_1.6.2               
[67] remotes_2.4.2             vctrs_0.6.2               httpuv_1.6.5             
[70] gtable_0.3.3              purrr_0.3.4               assertthat_0.2.1         
[73] cachem_1.0.5              rsvd_1.0.5                mime_0.12                
[76] xtable_1.8-4              later_1.3.0               viridisLite_0.4.2        
[79] tibble_3.2.1              beeswarm_0.4.0            memoise_2.0.1            
[82] ellipsis_0.3.2 

shangguandong1996 avatar May 23 '23 01:05 shangguandong1996

It would be nice if plotExpression also can control the fill_by argument

Yunuuuu avatar Dec 19 '23 17:12 Yunuuuu

I attempted to implement it, but incorporating this functionality into the plotExpression function would complicate it significantly due to the unpredictability of user inputs, especially when using the group aesthetics for the violin plot. Therefore, I ultimately decided to utilize the makePerCellDF function for this purpose. However, I am unsure if it is necessary to add a function that returns the data in long-format for plot.

data <- scuttle::makePerCellDF(sce_object, features = markers)
data <- tidyr::pivot_longer(data,
        cols = all_of(markers),
        names_to = "Feature",
        values_to = "logcounts"
)
violin_plot <- ggplot(data, aes(factor(label), logcounts)) +
        geom_violin(aes(fill = celltypes), scale = "width", width = 0.8) +
        scale_fill_brewer(type = "qual", palette = "Set3") +
        guides(fill = guide_legend(
            title = "Cell type", override.aes = list(size = 2L), ncol = 1L
        )) +
        labs(x = NULL) +
        facet_wrap(vars(Feature),
            ncol = n_col, scales = "free_x"
        ) +
        cowplot::theme_cowplot(font_size = 10L) +
        theme(axis.text.x = element_text(size = 6L))

Yunuuuu avatar Dec 19 '23 18:12 Yunuuuu