scanpy
scanpy copied to clipboard
scanpy.tl.umap after bbknn
- [ ] I have checked that this issue has not already been reported.
- [ ] I have confirmed this bug exists on the latest version of scanpy.
- [ ] (optional) I have confirmed this bug exists on the master branch of scanpy.
Hi, I got an error when running tl.umap after bbknn normalisation... new in version 1.7.2
Minimal code sample (that we can copy&paste without having any data)
adata_bbknn = bbknn.bbknn(adata, batch_key = metacol, n_pcs = number_of_pcs_for_reduction,copy=True)
scanpy.tl.umap(adata_bbknn, min_dist=0.2, spread=2, n_components=3)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-73-a5a2e6833485> in <module>()
1 adata_bbknn = bbknn.bbknn(adata, batch_key = metacol, n_pcs = number_of_pcs_for_reduction,copy=True)
----> 2 scanpy.tl.umap(adata_bbknn, min_dist=0.2, spread=2, n_components=3)
/home/sguenth/.conda/envs/scRNAseq_analysis_1.6/lib/python3.7/site-packages/scanpy/tools/_umap.py in umap(adata, min_dist, spread, n_components, maxiter, alpha, gamma, negative_sample_rate, init_pos, random_state, a, b, copy, method, neighbors_key)
205 neigh_params.get('metric', 'euclidean'),
206 neigh_params.get('metric_kwds', {}),
--> 207 verbose=settings.verbosity > 3,
208 )
209 elif method == 'rapids':
/home/sguenth/.conda/envs/scRNAseq_analysis_1.6/lib/python3.7/site-packages/umap/umap_.py in simplicial_set_embedding(data, graph, n_components, initial_alpha, a, b, gamma, negative_sample_rate, n_epochs, init, random_state, metric, metric_kwds, output_metric, output_metric_kwds, euclidean_output, parallel, verbose)
1037 random_state,
1038 metric=metric,
-> 1039 metric_kwds=metric_kwds,
1040 )
1041 expansion = 10.0 / np.abs(initialisation).max()
/home/sguenth/.conda/envs/scRNAseq_analysis_1.6/lib/python3.7/site-packages/umap/spectral.py in spectral_layout(data, graph, dim, random_state, metric, metric_kwds)
304 random_state,
305 metric=metric,
--> 306 metric_kwds=metric_kwds,
307 )
308
/home/sguenth/.conda/envs/scRNAseq_analysis_1.6/lib/python3.7/site-packages/umap/spectral.py in multi_component_layout(data, graph, n_components, component_labels, dim, random_state, metric, metric_kwds)
191 random_state,
192 metric=metric,
--> 193 metric_kwds=metric_kwds,
194 )
195 else:
/home/sguenth/.conda/envs/scRNAseq_analysis_1.6/lib/python3.7/site-packages/umap/spectral.py in component_layout(data, n_components, component_labels, dim, random_state, metric, metric_kwds)
120 else:
121 distance_matrix = pairwise_distances(
--> 122 component_centroids, metric=metric, **metric_kwds
123 )
124
/home/sguenth/.conda/envs/scRNAseq_analysis_1.6/lib/python3.7/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
70 FutureWarning)
71 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 72 return f(**kwargs)
73 return inner_f
74
/home/sguenth/.conda/envs/scRNAseq_analysis_1.6/lib/python3.7/site-packages/sklearn/metrics/pairwise.py in pairwise_distances(X, Y, metric, n_jobs, force_all_finite, **kwds)
1738 raise ValueError("Unknown metric %s. "
1739 "Valid metrics are %s, or 'precomputed', or a "
-> 1740 "callable" % (metric, _VALID_METRICS))
1741
1742 if metric == "precomputed":
ValueError: Unknown metric angular. Valid metrics are ['euclidean', 'l2', 'l1', 'manhattan', 'cityblock', 'braycurtis', 'canberra', 'chebyshev', 'correlation', 'cosine', 'dice', 'hamming', 'jaccard', 'kulsinski', 'mahalanobis', 'matching', 'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule', 'wminkowski', 'nan_euclidean', 'haversine'], or 'precomputed', or a callable
Versions
anndata 0.7.5 scanpy 1.7.2 sinfo 0.3.1
PIL 8.0.1 anndata 0.7.5 annoy NA bbknn NA cached_property 1.5.1 cairo 1.20.0 cffi 1.14.4 colorama 0.4.4 cycler 0.10.0 cython_runtime NA dateutil 2.8.1 decorator 4.4.2 get_version 2.1 h5py 3.1.0 igraph 0.8.3 ipykernel 5.3.4 ipython_genutils 0.2.0 joblib 0.17.0 kiwisolver 1.3.1 legacy_api_wrap 0.0.0 leidenalg 0.8.3 llvmlite 0.34.0 louvain 0.6.1 matplotlib 3.3.3 mpl_toolkits NA natsort 7.1.0 numba 0.51.2 numexpr 2.7.1 numpy 1.19.4 packaging 20.4 pandas 1.1.4 pexpect 4.8.0 pickleshare 0.7.5 pkg_resources NA prompt_toolkit 1.0.15 psutil 5.8.0 ptyprocess 0.6.0 pycparser 2.20 pygments 2.7.2 pyparsing 2.4.7 pytz 2020.4 scanpy 1.7.2 scipy 1.5.3 seaborn 0.11.0 setuptools_scm NA simplegeneric NA sinfo 0.3.1 six 1.15.0 sklearn 0.23.2 sphinxcontrib NA statsmodels 0.12.1 storemagic NA tables 3.6.1 texttable 1.6.3 tornado 6.1 traitlets 5.0.5 typing_extensions NA umap 0.4.6 wcwidth 0.2.5 zipp NA zmq 20.0.0
IPython 5.8.0 jupyter_client 6.1.7 jupyter_core 4.7.0
Python 3.7.8 | packaged by conda-forge | (default, Nov 27 2020, 19:24:58) [GCC 9.3.0] Linux-4.9.0-16-amd64-x86_64-with-debian-9.13 8 logical CPU cores
Session information updated at 2021-09-01 08:49
@guensen0 Did you ever solve this issue?
Just run into the same problem and found solution. I'll post it here in case anyone will need it.
Briefly: adata.uns['neighbors']['params']['metric'] = 'cosine'
will do the trick (or choose any other valid metric)
Not completely sure, but seems it happens when neighbour graph consists of more than one components. In this case umap
needs to estimate the distance between them. It takes metric name from adata.uns['neighbors']['params']['metric']
but angular
is not supported in umap
that cause the problem. The strange thing that the example given by @guensen0 uses defaults that at least now is euclidean
. Maybe it was different in the past. But at least in my case the above-mentioned trick solved the problem.
Other options are 1) make sure that neighbour graph if fully linked (increase number of neighbors) b) use metrics that are supported by both bbknn
and umap
(almost all except angular)