pyslingshot
pyslingshot copied to clipboard
Performance and reliability
Hi!
This issue is carried over from the slingshot repo.
My question there had focused on whether pyslingshot
was still under active development, thanks @mossjacob for the clarification on that it indeed is. Additionally, I raised two points:
- Speed of
pyslingshot
vs.slingshot
, with the former being more optimized but the latter benefitting greatly from theapprox_points
argument, making it faster in practice. If this functionality could be ported that would be amazing. - Reliability/user-friendliness: I have not managed to run
pyslingshot
on a dataset with ~15000 cells and 10 PCs as input. The remainder of this issue will address this point.
Unfortunately I cannot share the tested data set at this stage. However, here are some bits to give you some intuition of the data.
PCA
The data set was center-scaled and dimensionality reduced with PCA and we selected the first 10 PCs using the "elbow" method. Clustering is done with Leiden on a kNN graph built on those PCs. I am aware that we are not really seeing any clear differentiation trajectory, however, I am developing general methods and am using this (imperfect) dataset as proof of concept. Colors in this plot represent the clusters.
Running pyslingshot
My call to pyslingshot
looks like this:
from slingshot import Slingshot
slingshot = Slingshot(data, cluster_labels_onehot, start_node=start_node)
slingshot.fit(num_epochs=1)
Prints 8 times the following:
UserWarning:
The maximal number of iterations maxit (set to 20 by the program) allowed for finding a smoothing spline with fp=s has been reached: s too small.
There is an approximation returned but the corresponding weighted sum of squared residuals does not satisfy the condition abs(fp-s)/s < tol.
Before finally erroring out with:
----> 1 slingshot.fit(num_epochs=1)
File ~/.../slingshot/slingshot.py:166, in Slingshot.fit(self, num_epochs, debug_axes)
163 self.debug_plot_avg = False
165 if epoch == num_epochs - 1: # plot curves
--> 166 self.plotter.clusters(self.debug_axes[1, 1], s=2, alpha=0.5)
167 self.plotter.curves(self.debug_axes[1, 1], self.curves)
TypeError: 'NoneType' object is not subscriptable
Not sure why it's still trying to plot to the debug axes when I am not in debug mode. slingshot.debug_level
is 0
in this example.
So I retried with the debug options on, following this tutorial:
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(10, 10))
custom_xlim = (-12, 12)
custom_ylim = (-12, 12)
slingshot = Slingshot(data, cluster_labels_onehot, start_node=start_node, debug_level='verbose')
slingshot.fit(num_epochs=1, debug_axes=axes)
Click for the resulting error message
/.../scipy/interpolate/_fitpack2.py:280: UserWarning:
The maximal number of iterations maxit (set to 20 by the program)
allowed for finding a smoothing spline with fp=s has been reached: s
too small.
There is an approximation returned but the corresponding weighted sum
of squared residuals does not satisfy the condition abs(fp-s)/s < tol.
warnings.warn(message)
0%| | 0/1 [00:31<?, ?it/s]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In [31], line 7
4 # plt.setp(axes, xlim=custom_xlim, ylim=custom_ylim)
6 slingshot = Slingshot(adata.obsm["X_pca"][:,:10], cluster_labels_onehot, start_node=20, debug_level='verbose')
----> 7 slingshot.fit(num_epochs=1, debug_axes=axes)
File ~/.../slingshot/slingshot.py:147, in Slingshot.fit(self, num_epochs, debug_axes)
144 self.calculate_cell_weights()
146 # Fit principal curve for all lineages using existing curves
--> 147 self.fit_lineage_curves()
149 # Ensure starts at 0
150 for l_idx, lineage in enumerate(self.lineages):
File ~/.../slingshot/slingshot.py:258, in Slingshot.fit_lineage_curves(self)
256 path_from = (self.data[i][0], curve.points_interp[i][0])
257 path_to = (self.data[i][1], curve.points_interp[i][1])
--> 258 self.debug_axes[0, 1].plot(path_from, path_to, c='black', alpha=alphas[i])
259 self.debug_axes[0, 1].plot(curve.points_interp[curve.order, 0],
260 curve.points_interp[curve.order, 1], label=str(lineage))
262 d_sq, dist = curve.project_to_curve(self.data, curve.points_interp[curve.order])
File ~/.../matplotlib/axes/_axes.py:1662, in Axes.plot(self, scalex, scaley, data, *args, **kwargs)
1419 """
1420 Plot y versus x as lines and/or markers.
1421
(...)
1659 (``'green'``) or hex strings (``'#008000'``).
1660 """
1661 kwargs = cbook.normalize_kwargs(kwargs, mlines.Line2D)
-> 1662 lines = [*self._get_lines(*args, data=data, **kwargs)]
1663 for line in lines:
1664 self.add_line(line)
File ~/.../matplotlib/axes/_base.py:311, in _process_plot_var_args.__call__(self, data, *args, **kwargs)
309 this += args[0],
310 args = args[1:]
--> 311 yield from self._plot_args(
312 this, kwargs, ambiguous_fmt_datakey=ambiguous_fmt_datakey)
File ~/.../matplotlib/axes/_base.py:544, in _process_plot_var_args._plot_args(self, tup, kwargs, return_kwargs, ambiguous_fmt_datakey)
542 return list(result)
543 else:
--> 544 return [l[0] for l in result]
File ~/.../matplotlib/axes/_base.py:544, in <listcomp>(.0)
542 return list(result)
543 else:
--> 544 return [l[0] for l in result]
File ~/.../matplotlib/axes/_base.py:537, in <genexpr>(.0)
534 else:
535 labels = [label] * n_datasets
--> 537 result = (make_artist(x[:, j % ncx], y[:, j % ncy], kw,
538 {**kwargs, 'label': label})
539 for j, label in enumerate(labels))
541 if return_kwargs:
542 return list(result)
File ~/.../matplotlib/axes/_base.py:351, in _process_plot_var_args._makeline(self, x, y, kw, kwargs)
349 default_dict = self._getdefaults(set(), kw)
350 self._setdefaults(default_dict, kw)
--> 351 seg = mlines.Line2D(x, y, **kw)
352 return seg, kw
File ~/.../matplotlib/_api/deprecation.py:454, in make_keyword_only.<locals>.wrapper(*args, **kwargs)
448 if len(args) > name_idx:
449 warn_deprecated(
450 since, message="Passing the %(name)s %(obj_type)s "
451 "positionally is deprecated since Matplotlib %(since)s; the "
452 "parameter will become keyword-only %(removal)s.",
453 name=name, obj_type=f"parameter of {func.__name__}()")
--> 454 return func(*args, **kwargs)
File ~/.../matplotlib/lines.py:393, in Line2D.__init__(self, xdata, ydata, linewidth, linestyle, color, gapcolor, marker, markersize, markeredgewidth, markeredgecolor, markerfacecolor, markerfacecoloralt, fillstyle, antialiased, dash_capstyle, solid_capstyle, dash_joinstyle, solid_joinstyle, pickradius, drawstyle, markevery, **kwargs)
389 self.set_markeredgewidth(markeredgewidth)
391 # update kwargs before updating data to give the caller a
392 # chance to init axes (and hence unit support)
--> 393 self._internal_update(kwargs)
394 self._pickradius = pickradius
395 self.ind_offset = 0
File ~/.../matplotlib/artist.py:1186, in Artist._internal_update(self, kwargs)
1179 def _internal_update(self, kwargs):
1180 """
1181 Update artist properties without prenormalizing them, but generating
1182 errors as if calling `set`.
1183
1184 The lack of prenormalization is to maintain backcompatibility.
1185 """
-> 1186 return self._update_props(
1187 kwargs, "{cls.__name__}.set() got an unexpected keyword argument "
1188 "{prop_name!r}")
File ~/.../matplotlib/artist.py:1162, in Artist._update_props(self, props, errfmt)
1159 if not callable(func):
1160 raise AttributeError(
1161 errfmt.format(cls=type(self), prop_name=k))
-> 1162 ret.append(func(v))
1163 if ret:
1164 self.pchanged()
File ~/.../matplotlib/artist.py:983, in Artist.set_alpha(self, alpha)
980 raise TypeError(
981 f'alpha must be numeric or None, not {type(alpha)}')
982 if alpha is not None and not (0 <= alpha <= 1):
--> 983 raise ValueError(f'alpha ({alpha}) is outside 0-1 range')
984 self._alpha = alpha
985 self.pchanged()
ValueError: alpha (nan) is outside 0-1 range
Through plt.show()
I can see that two of the four debug plots still ran through:
I can see that this may be caused by an erroneous linage. See the first entry of the following output
[np.isnan(slingshot.curves[i].pseudotimes_interp).any() for i in range(len(slingshot.curves))]
# [True, False, False, False, False, False, False, False]
The corresponding lineage is: Lineage[20, 3, 10, 14, 18]
Long story short: I cannot get pyslingshot
to run to completion in my hands, probably caused by this odd lineage there. When I export this exact data, run it through R's slingshot
with approx_points
set to default it finishes successfuly, and quicker than pyslingshot
takes to get to this error.
This account is in no way meant to be discouraging or aggresive, and I am sorry if this wall of text conveys frustration. I am trying to lay out my struggle, but want to emphasise that I appreciate your development of pyslingshot
. Also, I realise that I may have a very specific data set and usage here, which may well not fit into the scope of pyslingshot
. I am content wrapping R's slingshot
if necessary and in fact do so at the moment. But if there was a way to move past these issues and use approx_points
, I would definitely move to pyslingshot
to save the roundtrip to R.
Thanks! Best, Jesko
Dear Jesko,
Thank you so much for going into such detail! It will greatly help me when debugging this. I'll get back to you once I've done some investigating.
Best wishes, Jacob