scirpy icon indicating copy to clipboard operation
scirpy copied to clipboard

Split IO into separate package

Open grst opened this issue 2 years ago • 2 comments

In the scverse core team the consensus was reached that IO should not be part of the analysis packages (e.g. scanpy, scirpy, muon), but rather in an independent package with minimal dependencies and have the analysis packages depend on it. The hope is that this leads to a wider adoption of scverse datastructures, since the "dependency cost" of depending on a lightweight IO packages is lower than depending on an entire framework. This issue is to track the goal of creating such a package for scirpy.

Name (?)

A couple of ideas

  • scirpy-io
  • scverse-airr
  • airr-io

Scope

  • All read_xxx and write_xxx functions in scirpy.io
  • AirrCell, to_airr_cells and from_airr_cells functions
  • ~~to/from_dandelion~~ (ideally dandelion adapts the scverse datastructure. Otherwise these functions should live in dandelion itself)

Maybe

  • merge_airr
  • index_chains
  • get.airr

The latter two go beyond just storing AIRR data as an awkward array, but implement the scirpy receptor model. But they are likely useful for some other packages. But then again if a method needs this, they could just depend on the full scirpy.

In case of doubt, err on the side of including less in the package, as it could be added later if required.

grst avatar Mar 16 '23 09:03 grst

As discussed with @zktuong, it would be nice to refer to the dandelion preprocessing workflow (which addresses some issues with the cellranger output) from this package and/or scirpy. In the end, this shouldn't be hard, as the dandelion pipeline reads cellranger output and writes AIRR, which can directoy be consumed by the read_airr function.

grst avatar Mar 16 '23 09:03 grst

tagging @DennisCambridge

zktuong avatar Mar 23 '23 00:03 zktuong