Milo: annotate adata from mdata
Description of feature
is it too simple to be a feature? from the neighborhood features to annotate the cells. given nhoods, the loc of the adata.obs would be: adata.obsm['nhoods'][:, nhoods.astype(int).tolist()].sum(axis=1).A1 > 0
just now I have a use case from da testing result to annotate whether the cells belong to enriched or deficient nhoods:
def annotate_adata(
self,
mdata: MuData,
anno_col: str | None="nhood_da",
feature_key: str | None = "rna",
spatialfdr: float | None = 0.05,
logfc: float | None = 2,
):
"""Assigns a categorical label to cells, based on whether they belongs to nhood enriched or deficient by DA testing results.
Args:
mdata: MuData object
anno_col: Column in adata.obs to hold the annotation labels
feature_key: If input data is MuData, specify key to cell-level AnnData object.
Returns:
None. Adds in place:
- `milo_mdata['rna'].obs["nhood_da"]`: assigning a label to each cell
"""
try:
sample_adata = mdata["milo"]
except KeyError:
logger.error(
"milo_mdata should be a MuData object with two slots: feature_key and 'milo' - please run milopy.count_nhoods(adata) first"
)
raise
adata = mdata[feature_key]
# check if mdata["milo"].var["SpatialFDR"] and mdata["milo"].var["logFC"] are present
if "SpatialFDR" not in mdata["milo"].var.columns or "logFC" not in mdata["milo"].var.columns:
raise ValueError(
"mdata['milo'].var['SpatialFDR'] and mdata['milo'].var['logFC'] are not present in the data. Please run milo.da_nhoods() first."
)
# Check column exists
if anno_col not in adata.obs.columns:
adata.obs[anno_col] = "non"
enriched_nhoods = mdata["milo"].var_names[(mdata["milo"].var["SpatialFDR"] < spatialfdr) & (mdata["milo"].var["logFC"] > logfc)]
deficient_nhoods = mdata["milo"].var_names[(mdata["milo"].var["SpatialFDR"] < spatialfdr) & (mdata["milo"].var["logFC"] < -logfc)]
enriched_obs = adata.obsm['nhoods'][:, enriched_nhoods.astype(int).tolist()].sum(axis=1).A1 > 0
deficient_obs = adata.obsm['nhoods'][:, deficient_nhoods.astype(int).tolist()].sum(axis=1).A1 > 0
adata.obs.loc[enriched_obs, anno_col] = "enriched"
adata.obs.loc[deficient_obs, anno_col] = "deficient"
confused_obs = enriched_obs & deficient_obs
if len(confused_obs) > 0:
logger.warning(
"Some neighbourhoods are both enriched and deficient. Annotate to mixed."
)
adata.obs[confused_obs, anno_col] = "mixed"
@brainfo thank you for your suggestion! I'll have a proper look soon.
@brainfo thank you for your suggestion! I'll have a proper look soon.
Hi, I have two more comments for tool/_milo.py with the workflow when da_nhoods take model_contrasts vector (multiple contrasts). In line 412 there's if any current to-be-added result column is there, drop all of them. but if the mdata is with previous results of the testing given different length of model_contrasts, there can be error when dropping, that is, some of the to-be-added results columns are not there to be able to be dropped. so maybe we could drop the intersect with the existing columns and to-be-added ones:
cols_to_drop = [col for col in res.columns if col in sample_adata.var.columns]
if cols_to_drop:
sample_adata.var = sample_adata.var.drop(cols_to_drop, axis=1)
And there the result table would have logFC.1, logFC.2, ... instead of one logFC; maybe then for visualization wraper line 722 plot_nhood_graph, one could add color parameter to take any field to be mapped as color:
def plot_nhood_graph( # pragma: no cover # noqa: D417
self,
mdata: MuData,
*,
color: str = "logFC", ## then replace the "logFC" to color inside the function
alpha: float = 0.1,
min_logFC: float = 0,
min_size: int = 10,
plot_edges: bool = False,
title: str = "DA log-Fold Change",
color_map: Colormap | str | None = None,
palette: str | Sequence[str] | None = None,
ax: Axes | None = None,
return_fig: bool = False,
**kwargs,
) -> Figure | None:
And I did add also a plot_nhood_counts_violin_by_cond with hue given by test_var as an alternative to your barplot and stripplot (uni color, no choice over stripplot).
if not log_counts:
sns.violinplot(data=pl_df, x=test_var, y="n_cells", palette=palette, hue=test_var)
plt.ylabel("# cells")
else:
sns.violinplot(data=pl_df, x=test_var, y="log_n_cells", palette=palette, hue=test_var)
plt.ylabel("log(# cells + 1)")
I guess these doesn't matter, somehow only I don't want to import another customize-util.
Thank you immensively for building pertpy!