anndataR
anndataR copied to clipboard
Tidy user interface
Tidy the user interface to reduce the exported functions to ones we think users should see.
Changes:
- [x] Un-export to/from functions
- [x] Modify
AnnData()to acceptSCE/Seurat(replacingfrom_*functions) - [x] Minor updates to vignette
- [x] Creation of man pages for conversion (so arguments are visible)
- [x] Check default returned objects
- [x] Update exported examples to use
generate_dataset()
Todo:
- [ ] Move in-memory stuff to separate vignette (maybe)
- [ ] Add tests/examples for
AnnData()withSCE/Seuratinput - [ ] Check
read_h5ad()/write_h5ad()tests
The test/example stuff I would like to do but requires a working Seurat converter. I think maybe it's easier to merge this first? The vignette would be nice but can be done later.
- Why would a user ever need to create a
HDF5AnnData? Theread/writefunctions handle going to/from files and I can't think of a reason to interact with one of these objects directly - I thought about doing the S3 version, happy to switch to that. Whether it is one function or two I guess is a design decision.
- The
AnnData()function I think is the same same as in the R {anndata} package. I'm not sure how much of an issue that is but maybe we should avoid clashing just in case?
@lazappi Just to confirm, after what we discussed, do you agree with the following?
AnnData()will return anInMemoryAnnData(default) or anHDF5AnnDataif the user really wants to. It should be noted that this is only for users who know what they're doing.- Users can already convert their anndata using
adata$to_SingleCellExperiment()oradata$to_InMemoryAnnData(), but it would be nice if this is also possible withas(adata, "SingleCellExperiment"). - Make sure internal classes (
InMemoryAnnDataandHDF5AnnData) and internal functions (from_*andto_*) are not exported. - Users are recommended to use
read_h5ad()andwrite_h5ad()to write their data from/to.h5adfiles
I'd like to merge this PR because I agree that it'd be nice to clean up our list of exports. Would you be able to make the changes to AnnData()?
Yes, I think so. I'll try to work on it, assuming I can work out how to implement as() properly.
Ok, so using as() probably won't work for us because there is no way to provide additional arguments (which I think we need for things like setting which assay should be X).
Possible alternatives:
- Keep what we currently have in the PR which overloads
AnnData()for goingSCE/Seurat -> AnnDatabut only has theadata$to_*interface for the reverse - Expose the
to_*/from_*functions directly (which is what I was trying to avoid) - An
S3type interface (not sure on the exact design) - Something else?
There is no way to provide additional arguments
Good point, I hadn't considered that.
Something else
Would you be ok with splitting it up into:
AnnData <- function(
obs_names = NULL,
var_names = NULL,
X = NULL,
obs = NULL,
var = NULL,
obsm = NULL,
varm = NULL,
obsp = NULL,
varp = NULL,
uns = NULL,
output_class = c("InMemoryAnnData", "HDF5AnnData",
...
)
And
as_AnnData <- function(
obj,
output_class = c("InMemoryAnnData", "HDF5AnnData"),
...
)
?
This way, the default as_AnnData will be the inmemory one, which can be used to write to disk using write_h5ad. I do like having the conversion separate from the regular constructor, because then we need to include code to make sure that the obj and the [obs_names, var_names, X, varm, obsm, ... arguments are mutually exclusive while they might as well just be split into two different functions.
Yeah, that could work. What about the reverse direction?
I'm not sure what you mean by the reverse direction.
Oh, you mean when we want to convert an AnnData to SCE or Seurat?
We can call adata$to_SingleCellExperiment() and adata$to_Seurat(). Is this what you mean?
I can't remember exactly but I think so, yes. I think we discussed and I wanted to avoid exposing these directly but I can't remember all the details.
I'm going to port these changes to a different branch because there are too many conflicts by now (sorry about that!)
In summary, I will:
- Make sure HDF5AnnData and InMemoryAnnData are not exposed. Users can:
- Use
AnnData()to create InMemoryAnnData's (and convert them to something else) - Use
read_h5ad()to open an h5ad file as an HDF5AnnData - ~Use
as_AnnData()to convert a Seurat object or a SingleCellExperiment.~ - Use
from_Seurat()to convert from a Seurat object, and usefrom_SingleCellExperiment()to convert from a SingleCellExperiment. I'm now thinking that instead ofas_AnnData(), the namesfrom_Seuratandfrom_SingleCellExperimentwould better explain what the purpose of those functions are, given thatadata$to_Seurat()andadata$to_SingleCellExperiment()exists. In addition, it would allow parameters that are Seurat-specific and parameters that are SingleCellExperiment.
- Use
Objects that will be removed from the NAMESPACE:
- HDF5AnnData
- InMemoryAnnData
- to_HDF5AnnData
- to_InMemoryAnnData
- to_Seurat
- to_SingleCellExperiment
At some point in the future, we should create a separate roxygen doc or a vignette to explain what the different possible conversions are.