scirpy
scirpy copied to clipboard
Split IO into separate package
In the scverse core team the consensus was reached that IO should not be part of the analysis packages (e.g. scanpy, scirpy, muon), but rather in an independent package with minimal dependencies and have the analysis packages depend on it. The hope is that this leads to a wider adoption of scverse datastructures, since the "dependency cost" of depending on a lightweight IO packages is lower than depending on an entire framework. This issue is to track the goal of creating such a package for scirpy.
Name (?)
A couple of ideas
- scirpy-io
- scverse-airr
- airr-io
Scope
- All
read_xxxandwrite_xxxfunctions inscirpy.io AirrCell,to_airr_cellsandfrom_airr_cellsfunctions- ~~
to/from_dandelion~~ (ideally dandelion adapts the scverse datastructure. Otherwise these functions should live in dandelion itself)
Maybe
merge_airrindex_chainsget.airr
The latter two go beyond just storing AIRR data as an awkward array, but implement the scirpy receptor model. But they are likely useful for some other packages. But then again if a method needs this, they could just depend on the full scirpy.
In case of doubt, err on the side of including less in the package, as it could be added later if required.
As discussed with @zktuong, it would be nice to refer to the dandelion preprocessing workflow (which addresses some issues with the cellranger output) from this package and/or scirpy. In the end, this shouldn't be hard, as the dandelion pipeline reads cellranger output and writes AIRR, which can directoy be consumed by the read_airr function.
tagging @DennisCambridge