pygsp icon indicating copy to clipboard operation
pygsp copied to clipboard

much improved NN graph

Open mdeff opened this issue 6 years ago • 16 comments

Contains PR #21 with some improvements.

Major fixes:

  • correct symmetrization of distance matrix (695272b1b73e2f55b0f31fc351121d82470af576)
  • squared width in gaussian kernel (f544e1e8d82b3b9faf9f58e61807a1494dc5555b)
  • FLANN returns distance^p for a p-norm (8cc3539d920a7a97c9a4050e8ff6eca402321642)

Major improvements:

  • much better and complete documentation
  • better and more complete tests
  • feature standardization (bf7427fa96dd522f26274ebd9569c6ec631f7a6f)
  • radius estimation (26e12e37ed5be39ad9db36f7953d0b95c52263fa)
  • user choice of similarity kernel (9c8e86e3bb7b8131de7570b7a247f0b365818887)

Ohers:

  • width = radius / 2 for radius graphs (011dc8159e1e7ee816d02a008b494e935d4b65fb)

mdeff avatar Feb 25 '19 00:02 mdeff

@nperraud, can you take a look?

mdeff avatar Feb 25 '19 00:02 mdeff

Coverage Status

Coverage increased (+0.8%) to 88.508% when pulling 3212350649d51721534e13ccec3d58c9a8cee029 on naspert-nn_refactor into 49ff67918d7d7b723cfa58d1bee017809627bd6b on master.

coveralls avatar Feb 25 '19 00:02 coveralls

Thanks for the review @nperraud. I'll address it soon.

While we're at it, I'm thinking about integrating knn and radius graphs. I'd remove the kind parameter, and rename k to max_neighbors and radius to max_distance. Then:

  • a knn graph would be built with max_neigbhors=k and max_distance=np.inf
  • a radius graph would be built with max_neigbhors=np.inf and max_distance=radius
  • anything in the middle is now possible!

mdeff avatar Feb 27 '19 16:02 mdeff

Another problem that I see is that the installation of the library for approximate nn is not well documented...

nperraud avatar Jul 18 '19 09:07 nperraud

I have just added the function I coded to this branch to get the nearest neighbor search out of the Knn graph. The Knn graph function still needs to be modified.

nperraud avatar Jul 22 '19 09:07 nperraud

Thanks! Can you fix the tests?

mdeff avatar Jul 23 '19 09:07 mdeff

I tried, but failed. They work on my machine but the seed is different. I have to think about how to do it... Maybe you can help me. Let us discuss later...

nperraud avatar Jul 23 '19 09:07 nperraud

Tests are passing... The coverage is reduced (probably less raise Error check...) but I think it will increase again once the functions in _nearest_neighboors are cleaned. I wil do that before my vacation...

Currently the code is not very clean...

nperraud avatar Jul 26 '19 12:07 nperraud

Ok Should be ready for the merge...

nperraud avatar Aug 17 '19 11:08 nperraud

@mdeff Everything is done. For me, it can be merged to master.

nperraud avatar Aug 19 '19 09:08 nperraud

Very nice to see more efficient knn search implementations for knn graph building! Is there anything still missing before merging the PR? Do you have a rough idea of when this will be merged? Thanks for all the hard work on this very useful python package!

gcuendet avatar Oct 17 '19 10:10 gcuendet

Thanks for the kind words @gcuendet. I mostly need to address review comments. I would also like to merge knn and radius graphs. I'll probably get to this in the not too far future as two other PRs (#55 and #57) depend on it. Then I'll make a new release.

mdeff avatar Oct 17 '19 11:10 mdeff

  • [x] rebase or merge? Too long history for rebase, and would make things even more difficult for sphere-graphs and learned-graph.
  • [ ] NNGraph is a bad name: Graph is redundant, and NN isn't explicit. pg.graphs.NearestNeighbors? (Also suggested in https://github.com/epfl-lts2/pygsp/pull/57#issuecomment-552178696.) Other ideas?
  • [ ] kernel_width => scale? See https://github.com/epfl-lts2/pygsp/pull/43#discussion_r259798956
  • [ ] squash splitting of nn functions (for learnedgraph) and move the file to pygsp.graphs
  • [ ] update twomoons (kwargs to NNGraph, variance?), see https://github.com/epfl-lts2/pygsp/pull/43#discussion_r546016646 and https://github.com/epfl-lts2/pygsp/pull/43#discussion_r259805805
  • [ ] good automatic scale for radius graph, see https://github.com/epfl-lts2/pygsp/pull/43#pullrequestreview-207356262 and https://github.com/epfl-lts2/pygsp/pull/43#discussion_r259802347
  • [ ] merge metric and order parameters => it's minkowski if int, names are aliases
  • [ ] remove kind, see https://github.com/epfl-lts2/pygsp/pull/43#issuecomment-467938475
  • [ ] address the issue raised by @gcuendet

mdeff avatar Dec 18 '20 19:12 mdeff

I suggest the following:

  • NearestNeighbors instead of NNGraph
  • I find all possibilities bad for the variable kernel_width because it is linked to the gaussian function. So I am not against scale, but I am not pushing for it...
  • Do yo want to do learn graph is the same PR?

nperraud avatar Dec 19 '20 14:12 nperraud

  • Let's go with NearestNeighbors then. :)
  • Agreed. Some synonyms for ideas: spread, extent, size, width, span, stretch. Maybe we should think of it as a step before the kernel, as in the end we do similarity = kernel(distance / scale). So it's acting on the distances. What about something that "adjusts" the metric? Like expansion, contraction, inflation, amplification, growth, dilation. Contraction of space/distances = expansion of the kernel.
  • learned-graph stays in its own PR, as does sphere-graphs.

mdeff avatar Dec 19 '20 15:12 mdeff

Seeing all your proposition, I believe scale is the best option. It is rescaling or renormalisation, but I would prefer not to use something like norm because it is confusing.

nperraud avatar Dec 19 '20 17:12 nperraud