pertpy Milo: annotate adata from mdata

Description of feature

is it too simple to be a feature? from the neighborhood features to annotate the cells. given nhoods, the loc of the adata.obs would be: adata.obsm['nhoods'][:, nhoods.astype(int).tolist()].sum(axis=1).A1 > 0

just now I have a use case from da testing result to annotate whether the cells belong to enriched or deficient nhoods:

    def annotate_adata(
        self,
        mdata: MuData,
        anno_col: str | None="nhood_da",
        feature_key: str | None = "rna",
        spatialfdr: float | None = 0.05,
        logfc: float | None = 2,
    ):
        """Assigns a categorical label to cells, based on whether they belongs to nhood enriched or deficient by DA testing results.

        Args:
            mdata: MuData object
            anno_col: Column in adata.obs to hold the annotation labels
            feature_key: If input data is MuData, specify key to cell-level AnnData object.

        Returns:
            None. Adds in place:
            - `milo_mdata['rna'].obs["nhood_da"]`: assigning a label to each cell
        """
        try:
            sample_adata = mdata["milo"]
        except KeyError:
            logger.error(
                "milo_mdata should be a MuData object with two slots: feature_key and 'milo' - please run milopy.count_nhoods(adata) first"
            )
            raise
        adata = mdata[feature_key]

        # check if mdata["milo"].var["SpatialFDR"] and mdata["milo"].var["logFC"] are present
        if "SpatialFDR" not in mdata["milo"].var.columns or "logFC" not in mdata["milo"].var.columns:
            raise ValueError(
                "mdata['milo'].var['SpatialFDR'] and mdata['milo'].var['logFC'] are not present in the data. Please run milo.da_nhoods() first."
            )
        
        # Check column exists
        if anno_col not in adata.obs.columns:
            adata.obs[anno_col] = "non"

        enriched_nhoods = mdata["milo"].var_names[(mdata["milo"].var["SpatialFDR"] < spatialfdr) & (mdata["milo"].var["logFC"] > logfc)]
        deficient_nhoods = mdata["milo"].var_names[(mdata["milo"].var["SpatialFDR"] < spatialfdr) & (mdata["milo"].var["logFC"] < -logfc)]
        enriched_obs = adata.obsm['nhoods'][:, enriched_nhoods.astype(int).tolist()].sum(axis=1).A1 > 0
        deficient_obs = adata.obsm['nhoods'][:, deficient_nhoods.astype(int).tolist()].sum(axis=1).A1 > 0
        adata.obs.loc[enriched_obs, anno_col] = "enriched"
        adata.obs.loc[deficient_obs, anno_col] = "deficient"
        confused_obs = enriched_obs & deficient_obs
        if len(confused_obs) > 0:
            logger.warning(
                "Some neighbourhoods are both enriched and deficient. Annotate to mixed."
            )
            adata.obs[confused_obs, anno_col] = "mixed"

Apr 26 '25 17:04 brainfo

@brainfo thank you for your suggestion! I'll have a proper look soon.

May 15 '25 21:05 Zethson

@brainfo thank you for your suggestion! I'll have a proper look soon.

Hi, I have two more comments for tool/_milo.py with the workflow when da_nhoods take model_contrasts vector (multiple contrasts). In line 412 there's if any current to-be-added result column is there, drop all of them. but if the mdata is with previous results of the testing given different length of model_contrasts, there can be error when dropping, that is, some of the to-be-added results columns are not there to be able to be dropped. so maybe we could drop the intersect with the existing columns and to-be-added ones:

        cols_to_drop = [col for col in res.columns if col in sample_adata.var.columns]
        if cols_to_drop:
            sample_adata.var = sample_adata.var.drop(cols_to_drop, axis=1)

And there the result table would have logFC.1, logFC.2, ... instead of one logFC; maybe then for visualization wraper line 722 plot_nhood_graph, one could add color parameter to take any field to be mapped as color:

def plot_nhood_graph(  # pragma: no cover # noqa: D417
        self,
        mdata: MuData,
        *,
        color: str = "logFC", ## then replace the "logFC" to color inside the function
        alpha: float = 0.1,
        min_logFC: float = 0,
        min_size: int = 10,
        plot_edges: bool = False,
        title: str = "DA log-Fold Change",
        color_map: Colormap | str | None = None,
        palette: str | Sequence[str] | None = None,
        ax: Axes | None = None,
        return_fig: bool = False,
        **kwargs,
    ) -> Figure | None:

And I did add also a plot_nhood_counts_violin_by_cond with hue given by test_var as an alternative to your barplot and stripplot (uni color, no choice over stripplot).

        if not log_counts:
            sns.violinplot(data=pl_df, x=test_var, y="n_cells", palette=palette, hue=test_var)
            plt.ylabel("# cells")
        else:
            sns.violinplot(data=pl_df, x=test_var, y="log_n_cells", palette=palette, hue=test_var)
            plt.ylabel("log(# cells + 1)")

I guess these doesn't matter, somehow only I don't want to import another customize-util.

Thank you immensively for building pertpy!

May 16 '25 06:05 brainfo