anndataR icon indicating copy to clipboard operation
anndataR copied to clipboard

Tidy user interface

Open lazappi opened this issue 1 year ago • 7 comments
trafficstars

Tidy the user interface to reduce the exported functions to ones we think users should see.

Changes:

  • [x] Un-export to/from functions
  • [x] Modify AnnData() to accept SCE/Seurat (replacing from_* functions)
  • [x] Minor updates to vignette
  • [x] Creation of man pages for conversion (so arguments are visible)
  • [x] Check default returned objects
  • [x] Update exported examples to use generate_dataset()

Todo:

  • [ ] Move in-memory stuff to separate vignette (maybe)
  • [ ] Add tests/examples for AnnData() with SCE/Seurat input
  • [ ] Check read_h5ad()/write_h5ad() tests

lazappi avatar Dec 05 '23 16:12 lazappi

The test/example stuff I would like to do but requires a working Seurat converter. I think maybe it's easier to merge this first? The vignette would be nice but can be done later.

lazappi avatar Dec 06 '23 16:12 lazappi

  • Why would a user ever need to create a HDF5AnnData? The read/write functions handle going to/from files and I can't think of a reason to interact with one of these objects directly
  • I thought about doing the S3 version, happy to switch to that. Whether it is one function or two I guess is a design decision.
  • The AnnData() function I think is the same same as in the R {anndata} package. I'm not sure how much of an issue that is but maybe we should avoid clashing just in case?

lazappi avatar Dec 11 '23 13:12 lazappi

@lazappi Just to confirm, after what we discussed, do you agree with the following?

  • AnnData() will return an InMemoryAnnData (default) or an HDF5AnnData if the user really wants to. It should be noted that this is only for users who know what they're doing.
  • Users can already convert their anndata using adata$to_SingleCellExperiment() or adata$to_InMemoryAnnData(), but it would be nice if this is also possible with as(adata, "SingleCellExperiment").
  • Make sure internal classes (InMemoryAnnData and HDF5AnnData) and internal functions (from_* and to_*) are not exported.
  • Users are recommended to use read_h5ad() and write_h5ad() to write their data from/to .h5ad files

I'd like to merge this PR because I agree that it'd be nice to clean up our list of exports. Would you be able to make the changes to AnnData()?

rcannood avatar Dec 14 '23 08:12 rcannood

Yes, I think so. I'll try to work on it, assuming I can work out how to implement as() properly.

lazappi avatar Dec 19 '23 09:12 lazappi

Ok, so using as() probably won't work for us because there is no way to provide additional arguments (which I think we need for things like setting which assay should be X).

Possible alternatives:

  • Keep what we currently have in the PR which overloads AnnData() for going SCE/Seurat -> AnnData but only has the adata$to_* interface for the reverse
  • Expose the to_*/from_* functions directly (which is what I was trying to avoid)
  • An S3 type interface (not sure on the exact design)
  • Something else?

lazappi avatar Dec 19 '23 10:12 lazappi

There is no way to provide additional arguments

Good point, I hadn't considered that.

Something else

Would you be ok with splitting it up into:

AnnData <- function(
    obs_names = NULL,
    var_names = NULL,
    X = NULL,
    obs = NULL,
    var = NULL,
    obsm = NULL,
    varm = NULL,
    obsp = NULL,
    varp = NULL,
    uns = NULL,
    output_class = c("InMemoryAnnData", "HDF5AnnData",
    ...
)

And

as_AnnData <- function(
  obj,
  output_class = c("InMemoryAnnData", "HDF5AnnData"),
  ...
)

?

This way, the default as_AnnData will be the inmemory one, which can be used to write to disk using write_h5ad. I do like having the conversion separate from the regular constructor, because then we need to include code to make sure that the obj and the [obs_names, var_names, X, varm, obsm, ... arguments are mutually exclusive while they might as well just be split into two different functions.

rcannood avatar Dec 20 '23 08:12 rcannood

Yeah, that could work. What about the reverse direction?

lazappi avatar Dec 20 '23 14:12 lazappi

I'm not sure what you mean by the reverse direction.

Oh, you mean when we want to convert an AnnData to SCE or Seurat?

We can call adata$to_SingleCellExperiment() and adata$to_Seurat(). Is this what you mean?

rcannood avatar Jun 27 '24 21:06 rcannood

I can't remember exactly but I think so, yes. I think we discussed and I wanted to avoid exposing these directly but I can't remember all the details.

lazappi avatar Jul 01 '24 07:07 lazappi

I'm going to port these changes to a different branch because there are too many conflicts by now (sorry about that!)

In summary, I will:

  • Make sure HDF5AnnData and InMemoryAnnData are not exposed. Users can:
    • Use AnnData() to create InMemoryAnnData's (and convert them to something else)
    • Use read_h5ad() to open an h5ad file as an HDF5AnnData
    • ~Use as_AnnData() to convert a Seurat object or a SingleCellExperiment.~
    • Use from_Seurat() to convert from a Seurat object, and use from_SingleCellExperiment() to convert from a SingleCellExperiment. I'm now thinking that instead of as_AnnData(), the names from_Seurat and from_SingleCellExperiment would better explain what the purpose of those functions are, given that adata$to_Seurat() and adata$to_SingleCellExperiment() exists. In addition, it would allow parameters that are Seurat-specific and parameters that are SingleCellExperiment.

Objects that will be removed from the NAMESPACE:

  • HDF5AnnData
  • InMemoryAnnData
  • to_HDF5AnnData
  • to_InMemoryAnnData
  • to_Seurat
  • to_SingleCellExperiment

At some point in the future, we should create a separate roxygen doc or a vignette to explain what the different possible conversions are.

rcannood avatar Aug 15 '24 18:08 rcannood