Leland McInnes

Results 492 comments of Leland McInnes

This is actually a nopython=False case; I think in practice it probably needs to be updated to use the python object context handling that has been added to numba. I'll...

This is actually an issue with pynndescent, which has since been fixed, but is waiting on a new release for the fix to propagate to PyPI. Hopefully that will be...

When running umap it will either be the one you cite first, or [this one from pynndescent](https://github.com/lmcinnes/pynndescent/blob/master/pynndescent/distances.py#L204) for the most part. It is possible that for small datasets (the cutoff...

I believe this is an issue relating to caching compilation in pynndescent. If you reinstall pynndescent, preferably directly from github, it should resolve the issue. To be clear, however, if...

While I appreciate the work, and the speedup would be helpful, I am wary of a new dependency on a binary library like this -- I also note you have...

Perhaps the more complex, but also more realistic [example based on US congressional voting]( https://umap-learn.readthedocs.io/en/latest/aligned_umap_politics_demo.html) may help a little. Generally the goal is to either have some specific identity that...

Theoretically it is possible, but there certainly isn't code for handling such a case currently. It would be a non-trivial project to get something written that would actually do a...

Sorry for being slow; I'm trying to loop back to this now, and I'll try to review it soon.

For a dataset that large I think you’ll be much better off using an index that uses quantization such as LanceDB or diskANN. On Fri, Mar 22, 2024 at 2:03 ...

That makes sense if you just want a kNN graph. The process you used is similar to the dask-distributed pynndescent that exists [here](https://github.com/lmcinnes/pynndescent/blob/distributed/pynndescent/distributed_nndescent.py) in the ``distributed`` branch. If there is...