James Melville comments

Results 80 comments of


James Melville

Range scale input before optimization

This is now available in `uwot` on the `master` branch if you set `init_sdev = "range"`, which admittedly makes very little sense in terms of a standard deviation, but saves...

Initialize queryable index with init_graph?

@lmcinnes this is tangentially related to the current discussion here about accuracy due to nearest neighbor descent, so can I just confirm that the current way pynndescent works is: To...

A challenging dataset for pynndescent

I appreciate the comments and suggestions @atarashansky. I don't currently have the bandwidth to do any clustering on the dataset, but it's a good idea. I've not looked at the...

A challenging dataset for pynndescent

Unfortunately, I did the majority of these calculations and data-processing in R, where I don't have access to nearest neighbor routines with correlation distance, so I'll have to get back...

A challenging dataset for pynndescent

[NN-Descent on High-Dimensional Data](https://doi.org/10.1145/3227609.3227643) (further expanded in [The Influence of Hubness on NN-Descent](https://doi.org/10.1142/S0218213019600029)) makes the case that NN Descent does badly in datasets with lots of "hubs": nodes that show...

A challenging dataset for pynndescent

the two strategies that stood out are: * increase the number of neighbors internally and scale down the sample rate to try and equalize the total computation time. * calculate...

Poor performance on ann-benchmarks

I have recently put in a bit more work to a C++ version of nearest neighbor descent, for eventual use with `uwot`. I have, of course, shamelessly copied the code...

Poor performance on ann-benchmarks

Thank you for all these details, especially about the memory use. From the sound of things, I may have to abandon the set-based high memory approach entirely for now. In...

Question about the implementation: reverse graph and building candidates

> What is the expected size of current_graph ? Is it (N, K) ? Yes, the `current_graph` has dimensions N x k. > Where is the reverse graph being computed?...

When distance computation is expensive how to gradually build graph

Others may have better advice but here is my perspective (FWIW I have spent a reasonable chunk of my spare time looking at how approximate nearest neighbors and nearest neighbor...