scirpy
scirpy copied to clipboard
Errors when running ir.pl.clonotype_imbalance and sc.pl.umap of specific clonotypes
Thank you for the really great tool! I was trying to rerun the tutorial, but I got several errors. I would appreciate any help on what could have gone wrong. Two of them were while running ir.pl.clonotype_imbalance:
ir.pl.clonotype_imbalance(
adata,
replicate_col="sample",
groupby="source",
case_label="Tumor",
plot_type="strip",
)
WARNING: Clonotype imbalance not found. Running `ir.tl.clonotype_imbalance` and storing under {key_added}
WARNING: Clonotype imbalance calculation depends on repertoire overlap. We could not detect any previous runs of repertoire_overlap, so the tool is running now...
---------------------------------------------------------------------------
ZeroDivisionError Traceback (most recent call last)
<ipython-input-74-9b5b8bddb872> in <module>
4 groupby="source",
5 case_label="Tumor",
----> 6 plot_type="strip",
7 )
~/anaconda3/lib/python3.7/site-packages/scirpy/_plotting/_clonotype_imbalance.py in clonotype_imbalance(adata, replicate_col, groupby, case_label, control_label, target_col, additional_hue, top_n, fraction, inplace, plot_type, key_added, xlab, ylab, title, **kwargs)
95 additional_hue=additional_hue,
96 fraction=fraction,
---> 97 key_added=key_added,
98 )
99
~/anaconda3/lib/python3.7/site-packages/scirpy/_tools/_clonotype_imbalance.py in clonotype_imbalance(adata, replicate_col, groupby, case_label, control_label, target_col, additional_hue, fraction, inplace, overlap_key, key_added)
119 for suspect in suspects:
120 p, logfoldchange, rel_case_sizes, rel_control_sizes = _calculate_imbalance(
--> 121 tdf1[suspect], tdf2[suspect], ncase, ncontrol, global_minimum
122 )
123 clt_stats.append([suspect, p, -np.log10(p), logfoldchange])
~/anaconda3/lib/python3.7/site-packages/scirpy/_tools/_clonotype_imbalance.py in _calculate_imbalance(case_sizes, control_sizes, ncase, ncontrol, global_minimum)
271 )
272 logfoldchange = np.log2(
--> 273 (case_mean_freq + global_minimum) / (control_mean_freq + global_minimum)
274 )
275 return p, logfoldchange, rel_case_sizes, rel_control_sizes
ZeroDivisionError: float division by zero
Second error:
ir.pl.clonotype_imbalance(
adata,
replicate_col="sample",
groupby="source",
case_label="Tumor",
additional_hue="diagnosis",
plot_type="volcano",
fig_kws={"dpi": 120},
)
WARNING: Clonotype imbalance not found. Running `ir.tl.clonotype_imbalance` and storing under {key_added}
WARNING: Clonotype imbalance calculation depends on repertoire overlap. We could not detect any previous runs of repertoire_overlap, so the tool is running now...
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2645 try:
-> 2646 return self._engine.get_loc(key)
2647 except KeyError:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'diagnosis'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-75-f891a51d2311> in <module>
6 additional_hue="diagnosis",
7 plot_type="volcano",
----> 8 fig_kws={"dpi": 120},
9 )
~/anaconda3/lib/python3.7/site-packages/scirpy/_plotting/_clonotype_imbalance.py in clonotype_imbalance(adata, replicate_col, groupby, case_label, control_label, target_col, additional_hue, top_n, fraction, inplace, plot_type, key_added, xlab, ylab, title, **kwargs)
95 additional_hue=additional_hue,
96 fraction=fraction,
---> 97 key_added=key_added,
98 )
99
~/anaconda3/lib/python3.7/site-packages/scirpy/_tools/_clonotype_imbalance.py in clonotype_imbalance(adata, replicate_col, groupby, case_label, control_label, target_col, additional_hue, fraction, inplace, overlap_key, key_added)
97 # Create a series of case-control groups for comparison
98 case_control_groups = _create_case_control_groups(
---> 99 adata.obs, replicate_col, groupby, additional_hue, case_label, control_label
100 )
101
~/anaconda3/lib/python3.7/site-packages/scirpy/_tools/_clonotype_imbalance.py in _create_case_control_groups(df, replicate_col, groupby, additional_hue, case_label, control_label)
199 else:
200 group_cols.append(additional_hue)
--> 201 hues = df[additional_hue].unique()
202 df = df.groupby(group_cols, observed=True).agg("size").reset_index()
203
~/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
2798 if self.columns.nlevels > 1:
2799 return self._getitem_multilevel(key)
-> 2800 indexer = self.columns.get_loc(key)
2801 if is_integer(indexer):
2802 indexer = [indexer]
~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2646 return self._engine.get_loc(key)
2647 except KeyError:
-> 2648 return self._engine.get_loc(self._maybe_cast_indexer(key))
2649 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2650 if indexer.ndim > 1 or indexer.size > 1:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'diagnosis'
And third error occured when trying to plot Top differential clonotypes between CD8_Teff and CD8_Trm clsuters on a UMAP:
freq, stat = ir.tl.clonotype_imbalance(
adata,
replicate_col="sample",
groupby="cluster",
case_label="CD8_Teff",
control_label="CD8_Trm",
inplace=False,
)
top_differential_clonotypes = stat["clonotype"].tolist()[:5]
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4), gridspec_kw={"wspace": 0.6})
sc.pl.umap(adata, color="cluster", ax=ax1, show=False)
sc.pl.umap(
adata,
color="clonotype",
groups=top_differential_clonotypes,
ax=ax2,
# increase size of highlighted dots
size=[
80 if c in top_differential_clonotypes else 30 for c in adata.obs["clonotype"]
],
)
TypeError Traceback (most recent call last)
<ipython-input-78-98b9b22b216a> in <module>
8 # increase size of highlighted dots
9 size=[
---> 10 80 if c in top_differential_clonotypes else 30 for c in adata.obs["clonotype"]
11 ],
12 )
~/anaconda3/lib/python3.7/site-packages/scanpy/plotting/_tools/scatterplots.py in umap(adata, **kwargs)
603 If `show==False` a :class:`~matplotlib.axes.Axes` or a list of it.
604 """
--> 605 return embedding(adata, 'umap', **kwargs)
606
607
~/anaconda3/lib/python3.7/site-packages/scanpy/plotting/_tools/scatterplots.py in embedding(adata, basis, color, gene_symbols, use_raw, sort_order, edges, edges_width, edges_color, neighbors_key, arrows, arrows_kwds, groups, components, layer, projection, scale_factor, color_map, cmap, palette, na_color, na_in_legend, size, frameon, legend_fontsize, legend_fontweight, legend_loc, legend_fontoutline, vmax, vmin, add_outline, outline_width, outline_color, ncols, hspace, wspace, title, show, save, ax, return_fig, **kwargs)
243 use_raw=use_raw,
244 gene_symbols=gene_symbols,
--> 245 groups=groups,
246 )
247 color_vector, categorical = _color_vector(
~/anaconda3/lib/python3.7/site-packages/scanpy/plotting/_tools/scatterplots.py in _get_color_source_vector(adata, value_to_plot, use_raw, gene_symbols, layer, groups)
1090 values = adata.obs_vector(value_to_plot, layer=layer)
1091 if groups and is_categorical_dtype(values):
-> 1092 values = values.replace(values.categories.difference(groups), np.nan)
1093 return values
1094
~/anaconda3/lib/python3.7/site-packages/pandas/core/arrays/categorical.py in replace(self, to_replace, value, inplace)
2442 inplace = validate_bool_kwarg(inplace, "inplace")
2443 cat = self if inplace else self.copy()
-> 2444 if to_replace in cat.categories:
2445 if isna(value):
2446 cat.remove_categories(to_replace, inplace=True)
~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in __contains__(self, key)
3898 @Appender(_index_shared_docs["contains"] % _index_doc_kwargs)
3899 def __contains__(self, key) -> bool:
-> 3900 hash(key)
3901 try:
3902 return key in self._engine
~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in __hash__(self)
3905
3906 def __hash__(self):
-> 3907 raise TypeError(f"unhashable type: {repr(type(self).__name__)}")
3908
3909 def __setitem__(self, key, value):
TypeError: unhashable type: 'Index'
Here are the package versions I have:
scanpy==1.7.1 anndata==0.7.3 umap==0.4.4 numpy==1.18.1 scipy==1.4.1 pandas==1.0.1 scikit-learn==0.22.1 statsmodels==0.11.0 python-igraph==0.7.1 leidenalg==0.8.0 scirpy==0.6.1
Here is the whole code I used: Scirpy_tutorial_3kT_cancer.zip
Thanks in advance!
Hi @Movahedilab,
thanks for your report!
I wasn't able to reproduce the error using datasets.wu2020_3k(), the indicated versions and your notebook.
Since the input numbers of your notebook are not consistent, I assume that you may have done something with adata that is
not part of the notebook and causes the error.
Or have you modified the numpy error handling, e.g. using seterr? Because I do get a warning about zero division, but not an error.
Can you bring down the problem to a minimal reproducible example?
Apart from that, I would be curious on how you are planning to apply the clonotype_imbalance function to your data. We still
consider this function experimental, and I am currently working on an improved/modified version and I would be interested to know if it still meets your use case.
Cheers, Gregor
Hi Gregor,
Thanks for the quick response!
I don't have much experience in python, but I have attached what I hope is a reproducible example. I haven't changed anything in the numpy settings and I didn't do any modifications to the adata.
I have also tried to run the tutorial on my dataset, using the same kernel, and surprisingly pl.clonotype_imbalance worked, though sc.pl.umap of specific clonotypes still gave the same error (see the second attached notebook).
Regarding your question on my plans for clonotype_imbalance, I won't be able to use it for the data I am working at the moment, as it doesn't have any expanded clonotypes. But for future datasets, I find this function very interesting for digging more in depth in the differences between clonotypes.
I have another question/comment. In the second attached notebook (DL017-018-example.ipynb), when I plot "clonal_expansion" and "clonotype_size", I get cells from the 1, 2 and >=3 category:

However ,in this dataset, mostly the cells in the upper cluster are B cells, and all the remaining cells do not have BCRs, so they should be in a category "0". These cells correctly have "has_ir"=False, but "clonal_expansion"=1:

Best, Daliya
From: Gregor Sturm [email protected] Sent: Monday, March 1, 2021 2:23 PM To: icbi-lab/scirpy [email protected] Cc: Movahedilab [email protected]; Mention [email protected] Subject: Re: [icbi-lab/scirpy] Errors when running ir.pl.clonotype_imbalance and sc.pl.umap of specific clonotypes (#244)
Hi @Movahedilabhttps://github.com/Movahedilab,
thanks for your report! I wasn't able to reproduce the error using datasets.wu2020_3k(), the indicated versions and your notebook. Since the input numbers of your notebook are not consistent, I assume that you may have done something with adata that is not part of the notebook and causes the error.
Or have you modified the numpy error handling, e.g. using seterrhttps://numpy.org/doc/stable/reference/generated/numpy.seterr.html#numpy.seterr? Because I do get a warning about zero division, but not an error.
Can you bring down the problem to a minimal reproducible examplehttps://stackoverflow.com/help/minimal-reproducible-example?
Apart from that, I would be curious on how you are planning to apply the clonotype_imbalance function to your data. We still consider this function experimental, and I am currently working on an improved/modified version and I would be interested to know if it still meets your use case.
Cheers, Gregor
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/icbi-lab/scirpy/issues/244#issuecomment-787945396, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASBXVL42X37WJDFXMPDATG3TBOINBANCNFSM4YINGN2A.
Hi Daliya,
unfortunately GitHub discards attachments sent by email. Could you please upload them using the web interface so that I can look into this?
Best, Gregor
Sorry, here they are: Examples_scirpy.zip
Thanks!
For me it runs through, although still with the warnings. Could you try to run
np.seterr(all="warn")
at the beginning of your notebook? I am starting to suspect you might have different default numpy setting -- for whatever reason.
Thanks for all the effort! I ran the same notebook with np.seterr(all="warn"): Scirpy_3kT_cancer_error_example.zip
Maybe different column types are the problem? https://github.com/pandas-dev/pandas/issues/17190
Yet that doesn't explain why it works for me, but not for you with the same versions. I'll keep this open and look into it when I have a bit more time.