anndata icon indicating copy to clipboard operation
anndata copied to clipboard

Obs reassignment silently allows you to change obs_names

Open Hrovatin opened this issue 2 years ago • 6 comments

When adding a whole obs to adata that already has obs_names the index matching is not checked. I would expect adata to complain if newly assigned obs has different index than obs_names.

>>> values = [0,1]
>>> values_rev = values[::-1]
>>> a = sc.AnnData(pd.DataFrame(np.array(values).reshape(-1,1), index=[str(i) for  i in values]))
>>> print('X original\n', a.X)
X original
 [[0.]
 [1.]]
>>> print('obs names original',a.obs_names)
obs names original Index(['0', '1'], dtype='object')
>>> o = pd.DataFrame(np.array(values_rev).reshape(-1,1), index=[str(i) for  i in values_rev])
>>> a.obs = o
>>> # Obs names/order change after adding new obs, but X stays the same
>>> print('obs new\n',a.obs)
obs new
    0
1  1
0  0
>>> print('X new\n',a.X)
X new
 [[0.]
 [1.]]

Hrovatin avatar Feb 24 '22 17:02 Hrovatin

Thanks for suggestion. Could you give an example of a use case where this causes problems?


I would broadly agree that this is weird. It makes more sense for it to be the obs_names, var_names of the AnnData that the obs.index, var.index.

However, we can't control or see when a user does:

adata.obs.index = ...
adata.obs.set_index(..., inplace=True)

I would suspect the two examples I showed are common, and I'm not sure if we can disallow them easily.

ivirshup avatar Feb 24 '22 18:02 ivirshup

So I made an adata without obs (just X with names in var and obs, from an expression data frame). Then at latter point I wanted to add a whole obs df, e.g. adata.obs=df, but this would cause index that is already in adata from creation to change. I would expect that if new var or obs would be assigned to adata the adata would check if index in adata matches index in the new df.

Hrovatin avatar Feb 24 '22 19:02 Hrovatin

What was the index in the new dataframe? Was it the right labels, but in the wrong order? Or a totally different set of labels?

ivirshup avatar Feb 25 '22 13:02 ivirshup

It was right labels in a wrong order. But I would expect an error if totally wrong index and for wrong order not sure if error or reordering (as in pandas)

Hrovatin avatar Feb 25 '22 16:02 Hrovatin

an example of when this would be an issue is when you're plotting cell data colored by annotation. if the annotation doesn't correspond to the cell it becomes meaningless.

jeanettejohnson avatar Apr 09 '22 17:04 jeanettejohnson

This issue has been automatically marked as stale because it has not had recent activity. Please add a comment if you want to keep the issue open. Thank you for your contributions!

github-actions[bot] avatar Jun 20 '23 02:06 github-actions[bot]