anndata icon indicating copy to clipboard operation
anndata copied to clipboard

Why is the 10x h5 reader implemented in scanpy and not in anndata?

Open LustigePerson opened this issue 6 years ago • 13 comments

I was just wondering if there is a specific reason why the 10x h5 reader function is not implemented in anndata. It would be great if this format could be loaded without the need to load the whole scanpy package first. Most other readers in scanpy are just loaded from anndata.

LustigePerson avatar Aug 19 '19 13:08 LustigePerson

From an api design standpoint, we try to keep AnnData non-specific to single cell. From a process standpoint, the 10x reader was implemented there and never moved. If the function was to move here, we'd probably want to rewrite it first so we wouldn't be adding the tables dependency to AnnData.

ivirshup avatar Aug 27 '19 03:08 ivirshup

Thank you for your response. I was just wondering because all the other readers are located in AnnData and just loaded to scanpy. But I understand that this is a design decision.

LustigePerson avatar Aug 27 '19 06:08 LustigePerson

I mean we can discuss this – is there a reason we don’t want the reader in here?

flying-sheep avatar Sep 02 '19 15:09 flying-sheep

For me it would make sense, as I might want to read data into the anndata format without the need to load the whole scanpy package. But as I understood from @ivirshup this was a design descision.

LustigePerson avatar Sep 02 '19 16:09 LustigePerson

I doubt that it was. An argument can be made that 10x is more single-cell-transcriptomics specific than anndata itself, but I’m not aware of e.g. loom being used in a different way, so …

flying-sheep avatar Sep 03 '19 08:09 flying-sheep

Hey! Yes, it was a design decision: the idea was that anndata is not limited to biological omics data just as loom. scanpy, by contrast, is.

These days, I'm not opposed to making it available from anndata, though. Even if we have 20 or 30 readers, I wouldn't say we have a cluttered API.

falexwolf avatar Sep 03 '19 09:09 falexwolf

I’d say that the only reason for a read function to be scanpy-specific is if it would create scanpy-specific conventions in the AnnData object (such as obsm['X_pca'] or so), but they don’t.

flying-sheep avatar Sep 03 '19 10:09 flying-sheep

I think it would be reasonable to be doing more with 10x files (where CITE-seq gets placed). I'd also want to see if we're going to be doing stuff with the visium data, and what those files look like.

One other issue is that the current 10x readers use tables not h5py and I'd prefer not to add tables as a dependency here. We could rewrite them, but I don't think this is a super high priority – especially for the legacy readers.

ivirshup avatar Sep 07 '19 07:09 ivirshup

I just opened a similar issue at Scanpy. It would be really great to have all the readers in one place -- even if it's in a standalone scio package, which would have functions other methods developers could export into their own packages.

adamgayoso avatar Aug 22 '20 23:08 adamgayoso

See also https://scverse.zulipchat.com/#narrow/stream/315789-data-structures/topic/scverse.20io.20package

grst avatar Apr 12 '23 11:04 grst

This issue has been automatically marked as stale because it has not had recent activity. Please add a comment if you want to keep the issue open. Thank you for your contributions!

github-actions[bot] avatar Jul 12 '23 02:07 github-actions[bot]

Let’s track this in https://github.com/scverse/scverse-io/issues/5

flying-sheep avatar Jul 17 '23 10:07 flying-sheep

I'd rather keep this on track as there's an open PR which fixes this (@gtca, please take a look), and the referenced issue doesn't really track a decision on where this function goes.

ivirshup avatar Jul 17 '23 12:07 ivirshup