Changing how a SumarizedExperiment gets displayed should not be a side effect of loading the package
Hi @stemangiola,
This is how a SummarizedExperiment object gets displayed before loading tidySummarizedExperiment:
library(SummarizedExperiment)
example(SummarizedExperiment)
se0
# class: SummarizedExperiment
# dim: 200 6
# metadata(0):
# assays(1): counts
# rownames: NULL
# rowData names(0):
# colnames(6): A B ... E F
# colData names(1): Treatment
And this is how it gets displayed after loading tidySummarizedExperiment:
library(tidySummarizedExperiment)
Loading required package: ttservice
se0
# # A SummarizedExperiment-tibble abstraction: 1,200 × 4
# # Features=200 | Samples=6 | Assays=counts
# .feature .sample counts Treatment
# <chr> <chr> <dbl> <chr>
# 1 1 A 9.66 ChIP
# 2 2 A 9.71 ChIP
# 3 3 A 9.50 ChIP
# 4 4 A 8.71 ChIP
# 5 5 A 9.72 ChIP
# 6 6 A 9.37 ChIP
# 7 7 A 6.34 ChIP
# 8 8 A 9.89 ChIP
# 9 9 A 9.62 ChIP
# 10 10 A 9.14 ChIP
# # ℹ 40 more rows
# # ℹ Use `print(n = ...)` to see more rows
This is not good behavior. See here and here for why.
Please address at your earliest convenience. I would recommend coordinating with the biocmask authors to avoid duplicated effort and ensure consistency between tidySummarizedExperiment and biocmask.
Thanks, H.
sessionInfo():
R version 4.4.0 (2024-04-24)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 23.10
Matrix products: default
BLAS: /home/hpages/R/R-4.4.0/lib/libRblas.so
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.11.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: America/Los_Angeles
tzcode source: system (glibc)
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] ggplot2_3.5.1 tidyr_1.3.1
[3] dplyr_1.1.4 tidySummarizedExperiment_1.15.1
[5] ttservice_0.4.1 SummarizedExperiment_1.35.3
[7] Biobase_2.65.1 GenomicRanges_1.57.1
[9] GenomeInfoDb_1.41.1 IRanges_2.39.2
[11] S4Vectors_0.43.2 BiocGenerics_0.51.3
[13] MatrixGenerics_1.17.0 matrixStats_1.4.1
loaded via a namespace (and not attached):
[1] plotly_4.10.4 utf8_1.2.4 generics_0.1.3
[4] SparseArray_1.5.41 stringi_1.8.4 lattice_0.22-6
[7] digest_0.6.37 magrittr_2.0.3 grid_4.4.0
[10] fastmap_1.2.0 jsonlite_1.8.9 Matrix_1.7-0
[13] httr_1.4.7 purrr_1.0.2 fansi_1.0.6
[16] viridisLite_0.4.2 UCSC.utils_1.1.0 scales_1.3.0
[19] lazyeval_0.2.2 abind_1.4-8 cli_3.6.3
[22] rlang_1.1.4 crayon_1.5.3 XVector_0.45.0
[25] ellipsis_0.3.2 munsell_0.5.1 withr_3.0.1
[28] DelayedArray_0.31.13 S4Arrays_1.5.10 tools_4.4.0
[31] colorspace_2.1-1 GenomeInfoDbData_1.2.13 vctrs_0.6.5
[34] R6_2.5.1 lifecycle_1.0.4 stringr_1.5.1
[37] zlibbioc_1.51.1 htmlwidgets_1.6.4 pkgconfig_2.0.3
[40] pillar_1.9.0 gtable_0.3.5 glue_1.8.0
[43] data.table_1.16.0 tibble_3.2.1 tidyselect_1.2.1
[46] htmltools_0.5.8.1 compiler_4.4.0
Hello Herve, one solution would be to create an overall display package, with the explicit goal of change display method.
Isolating that from tidySE and biomask, will have the nice effect of making them interoperable.
I am discussing this with Michael Love.
Hi Stefano,
See for a cleaner OO approach here: https://github.com/Bioconductor/Contributions/issues/3616#issuecomment-2423413619
The idea is to formalize the "SummarizedExperiment-tibble Abstraction" by wrapping the SummarizedExperiment object in an S4 shell. Maybe tidySummarizedExperiment, tidySingleCellExperiment, and the Bioconductor tidy stack in general, could do something like this? A SummarizedExperiment object is not a data-frame-like object so it's conceptually wrong to pass it naked to a dplyr verb.
Let me know what you think.
Best, H.
Thanks Herve,
it seems elegant. if I understand it would require a new class. Do you mean that through LFSummarizedExperiment dplyr would NOT know that this is a SE object? Under the hood, we define dplyr verbs to often act on colData or rowData directly to gain efficiency. For example, if I filter for feature-wise columns, I internally act on rowData, and the same for colData. I don't think I fully understand the setup and if I am missing something here.
As a note about the philosophy (even if we forget the tidy data display), the idea of the tidy stack is to function as a transparent plugin for existing analysis pipelines, which that would not change any of its behaviour. For example, changing the meaning of dim() would change the meaning of some code. Also, the idea is that if I transfer my SE object to my collaborator who does not use tidy, they would have no trace that tidyomics was used, and neither would the SE object. Does this reasoning make sense?
This has been addressed in the new version and with the tidyprint package
https://github.com/Bioconductor/Contributions/issues/3934
Thanks Stefano. I'm a little bit confused though. The tidyprint package changes again the default display of SE objects. How does that address the original issue? I appreciate the fact that tidySummarizedExperiment no longer does this but my original point was that nobody should do this.
Thanks @hpages for your message. We agreed that changing the display of an object as a passive consequence of loading a manipulation package could be undesired (although no complaints from the community emerged). Therefore, we have heavily restructured our framework during the last year to satisfy that concern.
We believe our solution of creating a package (tidyprint) whose sole purpose is to abstract an object would make it a very conscious choice for each user, with no consequences or traces for any downstream analyst. Therefore, while there are benefits for the users who opt in, in practice, there are no downsides for any other package/ecosystem. As a note, arguably, this is less invasive than creating yet another class and much better for the user than calling a show function at every instance, while this is the sole purpose of the package.
I tag @mikelove for knowledge and further feedback.
Provided that these principles have already been accepted by the community, as described in our publication, and no side effects have been reported, we hope tidyprint can soon be accepted as a solution. In the future, we will be happy to reevaluate this solution if problems arise.
Displacing the problem to a dedicated package doesn't change the fact that, if we accept this new package, we will again have a package in the ecosystem that changes the default display of a SummarizedExperiment object or derivative. I will argue again that this is not a good idea. At any moment the end user will not recognize the object they were working with just because at some point in their workflow they had to load some package X that depends on tidyprint. Good for users who consciously made that choice by explicitely loading tidyprint, but not so good for all the other users.
Of course this won't happen soon because right now tidyprint is a terminal package (i.e. nothing depends on it), but it will happen as soon as tidyprint is no longer a terminal package. So even though accepting tidyprint now wouldn't have immediate adverse effect, it would be planting the seed for problems in the future.
More generally speaking, replacing methods defined in other packages with your own is (almost) never a good idea.
To further avoid non-explicit choice, we can implement a check
is_attached <- function(pkg = "mypkg") {
paste0("package:", pkg) %in% search()
}
upstream of the abstraction. In this way, only conscious loading on the package would have an effect.
Not sure I understand how this would work exactly. More precisely I don't know how you can programatically distinguish between a package that lands on the search path because the user explicitely requested it vs a package that lands on the search path as a side effect of loading another package. I'm skeptical that this is actually doable without using some extremely hacky and fragile trick.
Also I'm not sure what's invasive about the wrapper class solution. That's what they've done in plyxp so you don't have to do it again. I'm assuming here that you would coordinate with the plyxp folks to make the necessary arrangements to move the wrapper class to a place that makes it easy for the two package to reuse. If for whatever reason this kind of sharing is not an option, it's easy to implement your own wrapper class. Note that they use new_plyxp() in plyxp to wrap the SE object (or derivative), but it seems that using a generic function with a generic name like as_tibble_abstraction() would make sense. Then you need methods for SummarizedExperiment and other Bioconductor classes for which you want to provide a tibble abstraction. Each method would return the original object wrapped into something that looks and feels like a tibble. Maybe someone has done something like that already and I'm not aware.
If the wrapper class is still not an option, please make sure that the end user explicitely opts in for seeing their SE objects disfigured. You could do this by requiring them to explicitely turn on a global option. Once this option is on, the new SE display should make it clear that what they see is still an SE (or derivative object) just displayed in an unusual way. It should also print a note that explains how to turn this off, just in case this was actually turned on by some code in some other package, and not on the user request. Not sure you need a new dedicated package for that.
Thanks
Thanks Herve. I understand your point of view.
I consider option() a feasible solution, though clunky, for solving a problem that, arguably, will never arise. On the contrary, creating shell classes would go against the non-invasiveness, no traces principles of tidyomics. Arguably much more impactful than an unlikely involuntary change of appearance.
Before implementing such a solution, I want to make sure no simpler solution is possible.
Please have a look at this example, which shows that it is quite straightforward to separate, explicitly library(), from @import from another package. This should really cover the most common dependency definition.
So I would like to ask to let us try to implement this clean solution, and if unlikely side effects should arise, we will be happy to pivot to one of the solutions you proposed.
We will, of course, implement all the messaging about deactivating tidy print as you were mentioning. All documentation and future workshops will also mention it.
> # Install packages via remotes
> remotes::install_github("tidyomics/tidyprint")
Using GitHub PAT from the git credential store.
Skipping install of 'tidyprint' from a github remote, the SHA1 (5a34a80b) has not changed since last install.
Use `force = TRUE` to force installation
> remotes::install_github("stemangiola/testtidyprintimport")
Using GitHub PAT from the git credential store.
Downloading GitHub repo stemangiola/testtidyprintimport@HEAD
── R CMD build ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
✔ checking for file ‘/private/var/folders/7w/rg5zrgd17dn9pfwgpbgwdvxc0000gp/T/RtmprdHeWb/remotes775c31407667/stemangiola-testtidyprintimport-4e142c937fa81199ca9fd72ed0247d65db0353ac/DESCRIPTION’ ...
─ preparing ‘testtidyprintimport’:
✔ checking DESCRIPTION meta-information ...
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
Omitted ‘LazyData’ from DESCRIPTION
─ creating default NAMESPACE file
─ building ‘testtidyprintimport_0.1.0.tar.gz’
Warning message:
package ‘BiocManager’ was built under R version 4.5.1
* installing *source* package ‘testtidyprintimport’ ...
** this is package ‘testtidyprintimport’ version ‘0.1.0’
** using staged installation
** help
No man pages found in package ‘testtidyprintimport’
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (testtidyprintimport)
>
> # Define helper function to check if package is attached
> is_attached <- function(pkg = "mypkg") {
+ paste0("package:", pkg) %in% search()
+ }
>
> # Load testtidyprintimport
> library(testtidyprintimport)
>
> # Check if tidyprint is attached (should be FALSE - imported but not attached)
> is_attached("tidyprint")
[1] FALSE
>
> # Explicitly load tidyprint
> library(tidyprint)
>
> # Check again if tidyprint is attached (should be TRUE)
> is_attached("tidyprint")
[1] TRUE
But what if tidyprint ends up on the search path because another package has it in Depends? As I wrote earlier, I'm not sure how you would be able to programatically distinguish between a package that lands on the search path because the user explicitely requested it vs a package that lands on the search path as a side effect of loading another package.
Thanks, Herve, I appreciate your carefulness.
I believe that a printing package in the Depends field goes against Bioconductor guidelines.
https://bioconductor.org/help/faq/
Depends: is appropriate for packages that cannot be imported (e.g., because they do not have a NAMESPACE) or for packages that provide essential functionality needed by the user of your package, e.g., your functions always return GRanges objects, so the user will necessarily need GenomicRanges on their search path.
I believe this is quite understood within the community, making this probability vanishingly tiny
Considering that
- We are now talking about potential issues with a vanishingly small probability, and
- The current proposed solution is a way in between the old framework (which it improves thanks to your feedback), and your recent solution, and there is always an opportunity to progress in that direction
Could we pilot the improved implementation (with the checks and messaging we mentioned) in Bioconductor, closely monitoring the situation in case anything arises?
If a package A decides to depend on tidyprint it's because they want the printing functionality provided by tidyprint. Now if you disable that functionality when tidyprint gets loaded only and not attached, where do you think the authors of package A are going to put tidyprint: in Imports or in Depends?
In other words the is_attached() trick is not going to work, and it's not hard to see why.
I regret that we are not able to make more progress on this. I will only reiter that no package should alter by default the display of objects that they don't own. This is something that the end user must explicitly opt-in, and it should be easy for them to opt-out at anytime. It's a very reasonable request and it's easy to implement. Can the conversation focus on how to achieve that?