Leland McInnes comments

Results 492 comments of


                                            Leland McInnes

Fix numba call for deprecation of nopython=True

This is actually a nopython=False case; I think in practice it probably needs to be updated to use the python object context handling that has been added to numba. I'll...

ValueError: cannot assign slice from input of different size

This is actually an issue with pynndescent, which has since been fixed, but is waiting on a new release for the fix to propagate to PyPI. Hopefully that will be...

Jaccard transformation function clarification

When running umap it will either be the one you cite first, or [this one from pynndescent](https://github.com/lmcinnes/pynndescent/blob/master/pynndescent/distances.py#L204) for the most part. It is possible that for small datasets (the cutoff...

ValueError: cannot assign slice from input of different size

I believe this is an issue relating to caching compilation in pynndescent. If you reinstall pynndescent, preferably directly from github, it should resolve the issue. To be clear, however, if...

Changes in Spectral Layout

While I appreciate the work, and the speedup would be helpful, I am wary of a new dependency on a binary library like this -- I also note you have...

Aligned UMAP: clarification on 'overlapping' points through time?

Perhaps the more complex, but also more realistic [example based on US congressional voting]( https://umap-learn.readthedocs.io/en/latest/aligned_umap_politics_demo.html) may help a little. Generally the goal is to either have some specific identity that...

AlignedUMAP: transform new observations

Theoretically it is possible, but there certainly isn't code for handling such a case currently. It would be a non-trivial project to get something written that would actually do a...

fix relations_dictionary problems which prevents from correctly updating aligned_umap

Sorry for being slow; I'm trying to loop back to this now, and I'll try to review it soon.

Very high memory usage

For a dataset that large I think you’ll be much better off using an index that uses quantization such as LanceDB or diskANN. On Fri, Mar 22, 2024 at 2:03 ...

Very high memory usage

That makes sense if you just want a kNN graph. The process you used is similar to the dask-distributed pynndescent that exists [here](https://github.com/lmcinnes/pynndescent/blob/distributed/pynndescent/distributed_nndescent.py) in the ``distributed`` branch. If there is...