anndata Lambda indexer for anndata

With pandas data frames a pattern I use very commonly is

df_with_very_long_descriptive_name.loc[lambda x: x["fruit"] == "banana", :]

I was wondering if it would be possible to have lambda-based indexers for anndata as well, e.g.

adata_t = adata[lambda x: x.obs["cell_type"] == "T cell", :]

The benefits of this approach are imo:

no need for duplication of the object name (which becomes annoying if it is not just named adata)
reduction of copy&paste errors. I somewhat often end up with something like adata_t[adata.obs["something"], :], because I forget to update the second variable name.

Apr 13 '22 17:04 grst

Why not simply

adata_t = adata[adata.obs.query("cell_type == ‘T cell’").index]

?

Apr 13 '22 18:04 dawe

Because i need to type adata twice in the same expression ;)

On Wed, Apr 13, 2022, 20:16 Davide Cittaro @.***> wrote:

Why not simply

adata_t = adata[adata.obs.query("cell_type == ‘T cell’").index]

?

— Reply to this email directly, view it on GitHub https://github.com/scverse/anndata/issues/758#issuecomment-1098346546, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABVZRV4LGBP3OQCRJ4JPMDTVE4FQTANCNFSM5TLNZS7Q . You are receiving this because you authored the thread.Message ID: @.***>

Apr 13 '22 18:04 grst

Great idea, pipelining does need to be possible.

Apr 19 '22 10:04 flying-sheep

I think this has been discussed before, and would definltley be in favor of something in this direction. I had thought maybe via a .select method.

I'm not sure I love lambda as the recommended way to do something like this though. It's only situationally more concise than typing adata twice, and I think we could ask get more out of alternative approaches.

I've been wondering about doing something more like polars or datafusion. These would go great with path based access too.

Apr 19 '22 17:04 ivirshup

It’s not verbosity, it’s about reusing an object in a pipeline without interrupting a pipleline and writing imperative code.

But yes, declarative code like polars is even better!

Apr 20 '22 09:04 flying-sheep

Rough sketch of a potential API:

def select(
    self, 
    identifiers: Union[str, list[str]] = "*",
    *,
    obs: Optional[Idxer] = None,
    var: Optional[Idxer] = None,
    copy: bool = False,
):
    """
    Return a new AnnData with selected elements at selected indices.

    By default, does not copy data unless necessary.

    Usage
    -----

    >>> adata.select(
            ["obsm/X_pca", "obs/cell_type"],
            obs=po.col("cell_type") == "B Cell",
        )
    AnnData object with n_obs × n_vars = 342 × 13714
        obs: "cell_type”
        obsm: "X_pca"
    """
    ...

Apr 20 '22 14:04 ivirshup

anndata anndata copied to clipboard

Lambda indexer for anndata

anndata
anndata copied to clipboard