anndata icon indicating copy to clipboard operation
anndata copied to clipboard

Duplicates in index throw error during make_index_unique

Open lesolorzanov opened this issue 5 months ago • 1 comments

My system: Ubuntu 23.10 Anndata version: '0.10.5.post1'

When dealing with anndata.var.index the function make_index_unique in utils, was constantly throwing a pandas related error:

pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects

By going into the utils file the error that appeared was

TypeError: 'Cannot setitem on a Categorical with a new category (X-1), set the categories first

It happened while attempting to remove the duplicates and creating a pd.Categorical in https://github.com/scverse/anndata/blob/d7643e966b7cfaf8f5c732f1f020b0674db1def9/src/anndata/utils.py#L244 and then attempting to add the tentative new name to a pd.Categorical without the categories which include the tentative new name.

In summary, when I change these lines to a more generic version, it solved the problem. Just wanted to share in case someone needs it

for example instead of values_dup = values[indices_dup] use values_dup = np.array(values[indices_dup])

and replace Values with a copy of itself but containing the right categories, like this: values=pd.Categorical(values,categories=list(values.categories)+list(values_dup))

lesolorzanov avatar Sep 11 '24 12:09 lesolorzanov