anndata
anndata copied to clipboard
Obs reassignment silently allows you to change obs_names
When adding a whole obs to adata that already has obs_names the index matching is not checked. I would expect adata to complain if newly assigned obs has different index than obs_names.
>>> values = [0,1]
>>> values_rev = values[::-1]
>>> a = sc.AnnData(pd.DataFrame(np.array(values).reshape(-1,1), index=[str(i) for i in values]))
>>> print('X original\n', a.X)
X original
[[0.]
[1.]]
>>> print('obs names original',a.obs_names)
obs names original Index(['0', '1'], dtype='object')
>>> o = pd.DataFrame(np.array(values_rev).reshape(-1,1), index=[str(i) for i in values_rev])
>>> a.obs = o
>>> # Obs names/order change after adding new obs, but X stays the same
>>> print('obs new\n',a.obs)
obs new
0
1 1
0 0
>>> print('X new\n',a.X)
X new
[[0.]
[1.]]
Thanks for suggestion. Could you give an example of a use case where this causes problems?
I would broadly agree that this is weird. It makes more sense for it to be the obs_names
, var_names
of the AnnData that the obs.index
, var.index
.
However, we can't control or see when a user does:
adata.obs.index = ...
adata.obs.set_index(..., inplace=True)
I would suspect the two examples I showed are common, and I'm not sure if we can disallow them easily.
So I made an adata without obs (just X with names in var and obs, from an expression data frame). Then at latter point I wanted to add a whole obs df, e.g. adata.obs=df, but this would cause index that is already in adata from creation to change. I would expect that if new var or obs would be assigned to adata the adata would check if index in adata matches index in the new df.
What was the index in the new dataframe? Was it the right labels, but in the wrong order? Or a totally different set of labels?
It was right labels in a wrong order. But I would expect an error if totally wrong index and for wrong order not sure if error or reordering (as in pandas)
an example of when this would be an issue is when you're plotting cell data colored by annotation. if the annotation doesn't correspond to the cell it becomes meaningless.
This issue has been automatically marked as stale because it has not had recent activity. Please add a comment if you want to keep the issue open. Thank you for your contributions!