anndata
anndata copied to clipboard
(Semi-)automatic conversion of nullable columns to the appropriate pandas arrays
Please describe your wishes and possible alternatives to achieve the desired result.
Since #504, AnnData supports nullable int and bool columns in obs
. Support for strings is planned in #679.
However, this only works if the nullable columns are represented as the appropriate pandas Array
extension type.
For instance this
import anndata
import numpy as np
import pandas as pd
adata = anndata.AnnData(
X=None,
obs=pd.DataFrame().assign(
test_int=np.array([1, 2, None, 3]),
test_bool=[True, False, None, False],
),
)
adata.write_h5ad("test.h5ad")
fails with TypeError: Can't implicitly convert non-string objects to strings
.
After converting the columns to pandas arrays, the object can be saved:
for c in adata.obs.columns:
adata.obs[c] = pd.array(adata.obs[c].values)
adata.write_h5ad("test.h5ad")
Unfortunately, the pandas extension arrays are little known and None
s might end up in adata.obs
for various reasons (for instance https://github.com/scverse/scirpy/issues/434).
I was wondering if such columns should be automatically converted to the appropriate pandas array, e.g. on save?
Or maybe there should be an equivalent to AnnData.strings_to_categoricals
that can be called to sanitize such columns?