scanpy
scanpy copied to clipboard
Issue with sanitize_anndata() in plotting functions with subsets of anndata objects are passed.
I get quite a strange scanpy error, which appears a bit stochastic... This is has happened for the first time in version 1.1.
I am trying to get a scatter plot of a subsetted anndata object like this:
p4 = sc.pl.scatter(adata[adata.obs['n_counts']<10000 ,:], 'n_counts', 'n_genes', color='mt_frac')
When I do this the first time round, I get this error message about categorical variables from sanitize_anndata (none of which are actually used in the call).
AttributeError Traceback (most recent call last)
<ipython-input-66-fc1479c238f7> in <module>()
9 plt.show()
10
---> 11 p4 = sc.pl.scatter(adata[adata.obs['n_counts']<10000 ,:], 'n_counts', 'n_genes', color='mt_frac')
12 p5 = sc.pl.scatter(adata, 'n_counts', 'n_genes', color='mt_frac')
13
~/scanpy/scanpy/plotting/anndata.py in scatter(adata, x, y, color, use_raw, sort_order, alpha, basis, groups, components, projection, legend_loc, legend_fontsize, legend_fontweight, color_map, palette, right_margin, left_margin, size, title, show, save, ax)
162 show=show,
163 save=save,
--> 164 ax=ax)
165
166 elif x in adata.var_keys() and y in adata.var_keys() and color not in adata.obs_keys():
~/scanpy/scanpy/plotting/anndata.py in _scatter_obs(adata, x, y, color, use_raw, sort_order, alpha, basis, groups, components, projection, legend_loc, legend_fontsize, legend_fontweight, color_map, palette, right_margin, left_margin, size, title, show, save, ax)
281 ax=None):
282 """See docstring of scatter."""
--> 283 sanitize_anndata(adata)
284 if legend_loc not in VALID_LEGENDLOCS:
285 raise ValueError(
~/scanpy/scanpy/utils.py in sanitize_anndata(adata)
481 # backwards compat... remove this in the future
482 def sanitize_anndata(adata):
--> 483 adata._sanitize()
484
485
~/anndata/anndata/base.py in _sanitize(self)
1284 if len(c.categories) < len(c):
1285 df[key] = c
-> 1286 df[key].cat.categories = df[key].cat.categories.astype('U')
1287 logg.info(
1288 '... storing \'{}\' as categorical'
~/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py in __getattr__(self, name)
3608 if (name in self._internal_names_set or name in self._metadata or
3609 name in self._accessors):
-> 3610 return object.__getattribute__(self, name)
3611 else:
3612 if name in self._info_axis:
~/anaconda3/lib/python3.6/site-packages/pandas/core/accessor.py in __get__(self, instance, owner)
52 # this ensures that Series.str.<method> is well defined
53 return self.accessor_cls
---> 54 return self.construct_accessor(instance)
55
56 def __set__(self, instance, value):
~/anaconda3/lib/python3.6/site-packages/pandas/core/categorical.py in _make_accessor(cls, data)
2209 def _make_accessor(cls, data):
2210 if not is_categorical_dtype(data.dtype):
-> 2211 raise AttributeError("Can only use .cat accessor with a "
2212 "'category' dtype")
2213 return CategoricalAccessor(data.values, data.index,
AttributeError: Can only use .cat accessor with a 'category' dtype
Then, I comment out the respective line of code, run the whole thing again, and it works. And when I uncomment the line it works fine again.
When I comment the line for the first time, I get a couple of lines displayed in the output saying:
... 'donor' was turned into a categorical variable ... 'gene_symbols' was turned into a categorical variable
or something like that...
My theory is that sanitize_anndata() detects that these variables should be categorical variables and tries to convert them into categoricals. As this sc.pl.scatter call is the first time sanitize_anndata() is called after the variables are read in, this is the first time this conversion would take place. However, I am calling the sc.pl.scatter() on a subsetted anndata object, so it somehow cannot do the conversion. Once I call sc.pl.scatter on a non-subsetted anndata object once, the conversion can take place and I can subsequently call sc.pl.scatter also on a subsetted anndata object.
If this is true, I can see why this is happening. However I feel this behaviour will be quite puzzling to a typical user. Maybe sanitize_anndata() should be called before plotting (probably hard to implement), or the plotting functions should have a parameter to plot only a subset of the data. That way sanitize_anndata can be called on the whole anndata object every time as there is no longer a reason to pass a view of the object. You could then test if a view is being passed to sanitize anndata, and then say "please don't pass subsetted anndata objects to plotting functions" or something like that.
Yes, this is related to the fact that sanitize_anndata cannot be meaningfully applied to a view of AnnData. You're right that one should also account for this case... I'll give it a thought. At least there should be a proper error hinting people to call sc.utils.sanitize_anndata when trying the call you mention.
Thank you very much for pointing this out. :smile: It should have happened also before version 1.1, though.
I have something that might be related:
ad = ad[ad.obs['cell type'] != 'nan'].copy()
assert np.all(ad.obs['cell type'] != 'nan')
sc.utils.sanitize_anndata(ad)
assert np.all(ad.obs['cell type'] != 'nan')
This fails in the second assert:
AssertionError Traceback (most recent call last)
<ipython-input-103-2f44e51fdcae> in <module>
8 assert np.all(ad.obs['cell type'] != 'nan')
9 sc.utils.sanitize_anndata(ad)
---> 10 assert np.all(ad.obs['cell type'] != 'nan')
11
12
AssertionError:
It's really black magic, any ideas?
PS: nans are really string, not proper NaNs.
@gokceneraslan are there actually nans in there? Could be related to https://github.com/theislab/anndata/issues/141.
Yes there are, and this is how I realized it. I saw them in the plots and wondered why they show up after removing them.
Oh you mean real NaNs, no there is not.
I'm having this issue where I read in and merge multiple anndata's with concat. I can't run any of the plotting functions because I get this error. I tried to convert all object/string obs to categorical (except obs names) but I can't really get around it at all.