Performance and reliability

Open jeskowagner opened this issue 2 years ago • 1 comments


This issue is carried over from the slingshot repo.

My question there had focused on whether pyslingshot was still under active development, thanks @mossjacob for the clarification on that it indeed is. Additionally, I raised two points:

  • Speed of pyslingshot vs. slingshot, with the former being more optimized but the latter benefitting greatly from the approx_points argument, making it faster in practice. If this functionality could be ported that would be amazing.
  • Reliability/user-friendliness: I have not managed to run pyslingshot on a dataset with ~15000 cells and 10 PCs as input. The remainder of this issue will address this point.

Unfortunately I cannot share the tested data set at this stage. However, here are some bits to give you some intuition of the data.


The data set was center-scaled and dimensionality reduced with PCA and we selected the first 10 PCs using the "elbow" method. Clustering is done with Leiden on a kNN graph built on those PCs. I am aware that we are not really seeing any clear differentiation trajectory, however, I am developing general methods and am using this (imperfect) dataset as proof of concept. Colors in this plot represent the clusters. image

Running pyslingshot

My call to pyslingshot looks like this:

from slingshot import Slingshot
slingshot = Slingshot(data, cluster_labels_onehot, start_node=start_node)

Prints 8 times the following:

The maximal number of iterations maxit (set to 20 by the program) allowed for finding a smoothing spline with fp=s has been reached: s too small.
There is an approximation returned but the corresponding weighted sum of squared residuals does not satisfy the condition abs(fp-s)/s < tol.

Before finally erroring out with:

Through I can see that two of the four debug plots still ran through: image

I can see that this may be caused by an erroneous linage. See the first entry of the following output

[np.isnan(slingshot.curves[i].pseudotimes_interp).any() for i in range(len(slingshot.curves))]
# [True, False, False, False, False, False, False, False]

The corresponding lineage is: Lineage[20, 3, 10, 14, 18]

Long story short: I cannot get pyslingshot to run to completion in my hands, probably caused by this odd lineage there. When I export this exact data, run it through R's slingshot with approx_points set to default it finishes successfuly, and quicker than pyslingshot takes to get to this error.

This account is in no way meant to be discouraging or aggresive, and I am sorry if this wall of text conveys frustration. I am trying to lay out my struggle, but want to emphasise that I appreciate your development of pyslingshot. Also, I realise that I may have a very specific data set and usage here, which may well not fit into the scope of pyslingshot. I am content wrapping R's slingshot if necessary and in fact do so at the moment. But if there was a way to move past these issues and use approx_points, I would definitely move to pyslingshot to save the roundtrip to R.

Thanks! Best, Jesko

Dear Jesko,

Thank you so much for going into such detail! It will greatly help me when debugging this. I'll get back to you once I've done some investigating.

Best wishes, Jacob

