anndataR icon indicating copy to clipboard operation
anndataR copied to clipboard

Add a `raw` slot

Open LouiseDck opened this issue 9 months ago • 3 comments

At the moment, anndataR completely disregards .raw. This is probably because there's no entry for it in the on-disk specification of anndata

It is possible to read in a file containing an anndata with a .raw attribute without encountering any errors.

Example:

import dummy_anndata as da

testds = da.generate_dataset(x_type = "integer_matrix", 
                             obs_types=["categorical", "integer_array"], 
                             var_types=["integer_array"], 
                             obsm_types=[],
                             varm_types=[],
                             obsp_types=[],
                             varp_types=[],
                             uns_types=[],
                             nested_uns_types=[],
                             layer_types=[],
                             )
testds.raw = testds

testds.write_h5ad("testraw.h5ad")
library("anndataR")

data <- read_h5ad("testraw.h5ad")

However, of course no data$raw exists.

Keeping in mind this conversation about .raw in AnnData we should probably deal with .raw in some way.

LouiseDck avatar Apr 09 '25 14:04 LouiseDck

There will need to be some changes to the objects and probably some other stuff to work out but the same reading functions should work from .raw. I think it should be the same structure nested within the file but we would need to check.

lazappi avatar Apr 11 '25 11:04 lazappi

When written to disk .raw has the same structure as a full object but with only X, var and varm.

import dummy_anndata as da

testds = da.generate_dataset(x_type = "integer_matrix",
                             obs_types=["categorical", "integer_array"],
                             var_types=["integer_array"],
                             obsm_types=["integer_array"],
                             varm_types=["integer_array"],
                             obsp_types=["integer_matrix"],
                             varp_types=["integer_matrix"],
                             uns_types=[],
                             nested_uns_types=[],
                             layer_types=[],
                             )
testds.raw = testds
 
testds.write_h5ad("testraw.h5ad")
$ h5ls testraw2.h5ad
X                        Dataset {10, 20}
layers                   Group
obs                      Group
obsm                     Group
obsp                     Group
raw                      Group
uns                      Group
var                      Group
varm                     Group
varp                     Group

$ h5ls testraw2.h5ad/raw
X                        Dataset {10, 20}
var                      Group
varm                     Group

We should be able to read from this if we add a reader but adding a slot for it that has a nested object will be more difficult.

lazappi avatar Apr 28 '25 12:04 lazappi

I've just hit this as well. For now, I'll add an extra step to my pipeline that loads the Anndata from disk, extracts the .raw element and writes it back to disk to (another) H5ad file that I can read into R via anndataR. However, being able to skip this step would be great.

mschilli87 avatar Oct 17 '25 12:10 mschilli87