pygsp
pygsp copied to clipboard
much improved NN graph
Contains PR #21 with some improvements.
Major fixes:
- correct symmetrization of distance matrix (695272b1b73e2f55b0f31fc351121d82470af576)
- squared width in gaussian kernel (f544e1e8d82b3b9faf9f58e61807a1494dc5555b)
- FLANN returns distance^p for a p-norm (8cc3539d920a7a97c9a4050e8ff6eca402321642)
Major improvements:
- much better and complete documentation
- better and more complete tests
- feature standardization (bf7427fa96dd522f26274ebd9569c6ec631f7a6f)
- radius estimation (26e12e37ed5be39ad9db36f7953d0b95c52263fa)
- user choice of similarity kernel (9c8e86e3bb7b8131de7570b7a247f0b365818887)
Ohers:
- width = radius / 2 for radius graphs (011dc8159e1e7ee816d02a008b494e935d4b65fb)
@nperraud, can you take a look?
Coverage increased (+0.8%) to 88.508% when pulling 3212350649d51721534e13ccec3d58c9a8cee029 on naspert-nn_refactor into 49ff67918d7d7b723cfa58d1bee017809627bd6b on master.
Thanks for the review @nperraud. I'll address it soon.
While we're at it, I'm thinking about integrating knn and radius graphs. I'd remove the kind parameter, and rename k to max_neighbors and radius to max_distance. Then:
- a knn graph would be built with
max_neigbhors=kandmax_distance=np.inf - a radius graph would be built with
max_neigbhors=np.infandmax_distance=radius - anything in the middle is now possible!
Another problem that I see is that the installation of the library for approximate nn is not well documented...
I have just added the function I coded to this branch to get the nearest neighbor search out of the Knn graph. The Knn graph function still needs to be modified.
Thanks! Can you fix the tests?
I tried, but failed. They work on my machine but the seed is different. I have to think about how to do it... Maybe you can help me. Let us discuss later...
Tests are passing... The coverage is reduced (probably less raise Error check...) but I think it will increase again once the functions in _nearest_neighboors are cleaned. I wil do that before my vacation...
Currently the code is not very clean...
Ok Should be ready for the merge...
@mdeff Everything is done. For me, it can be merged to master.
Very nice to see more efficient knn search implementations for knn graph building! Is there anything still missing before merging the PR? Do you have a rough idea of when this will be merged? Thanks for all the hard work on this very useful python package!
Thanks for the kind words @gcuendet. I mostly need to address review comments. I would also like to merge knn and radius graphs. I'll probably get to this in the not too far future as two other PRs (#55 and #57) depend on it. Then I'll make a new release.
- [x] rebase or merge? Too long history for rebase, and would make things even more difficult for sphere-graphs and learned-graph.
- [ ]
NNGraphis a bad name: Graph is redundant, and NN isn't explicit.pg.graphs.NearestNeighbors? (Also suggested in https://github.com/epfl-lts2/pygsp/pull/57#issuecomment-552178696.) Other ideas? - [ ]
kernel_width=>scale? See https://github.com/epfl-lts2/pygsp/pull/43#discussion_r259798956 - [ ] squash splitting of nn functions (for learnedgraph) and move the file to
pygsp.graphs - [ ] update
twomoons(kwargstoNNGraph,variance?), see https://github.com/epfl-lts2/pygsp/pull/43#discussion_r546016646 and https://github.com/epfl-lts2/pygsp/pull/43#discussion_r259805805 - [ ] good automatic
scalefor radius graph, see https://github.com/epfl-lts2/pygsp/pull/43#pullrequestreview-207356262 and https://github.com/epfl-lts2/pygsp/pull/43#discussion_r259802347 - [ ] merge
metricandorderparameters => it's minkowski ifint, names are aliases - [ ] remove
kind, see https://github.com/epfl-lts2/pygsp/pull/43#issuecomment-467938475 - [ ] address the issue raised by @gcuendet
I suggest the following:
NearestNeighborsinstead ofNNGraph- I find all possibilities bad for the variable
kernel_widthbecause it is linked to the gaussian function. So I am not againstscale, but I am not pushing for it... - Do yo want to do learn graph is the same PR?
- Let's go with
NearestNeighborsthen. :) - Agreed. Some synonyms for ideas:
spread,extent,size,width,span,stretch. Maybe we should think of it as a step before the kernel, as in the end we dosimilarity = kernel(distance / scale). So it's acting on the distances. What about something that "adjusts" the metric? Like expansion, contraction, inflation, amplification, growth, dilation. Contraction of space/distances = expansion of the kernel. - learned-graph stays in its own PR, as does sphere-graphs.
Seeing all your proposition, I believe scale is the best option. It is rescaling or renormalisation, but I would prefer not to use something like norm because it is confusing.