pqtable
pqtable copied to clipboard
Python wrappers / inclusion into ANN-Benchmarks
Hi!
I just found your paper and repo, nice work! We are running a fairly large benchmark of ANN algorithms (https://github.com/erikbern/ann-benchmarks) and would like to include your implementation!
It would be easiest to include your algorithm, if you would provide Python wrappers to your code, using for example https://github.com/pybind/pybind11. In addition, this could draw many more users to your algorithm, since many users of ann algorithms code in Python.
Are there any plans to include a python wrapper? I could write the wrapper from our benchmark afterwards. Moreover, we also have a development branch where we allow inclusion from other programming languages by means of implementing a protocol. The development branch is here https://github.com/maumueller/ann-benchmarks and wrappers look like this https://github.com/maumueller/ann-benchmarks/blob/master/install/lib-dolphinn.cpp.
Best, Martin
Hi @maumueller,
Thanks for inviting me and I'd love to include my algorithm in your benchmark! I guess it'd be best to write Python wrappers by pybind11. As I'm busy this month, I'll work on it in Jan or Feb next year.
Best, Yusuke
Hi @matsui528,
great to hear! Please keep me up-to-date and ping me if you have questions w.r.t. the benchmark.
Best, Martin
Hi @matsui528,
a small update here: We are currently including support for other languages in ann-benchmarks and I chose your implementation to play around a little bit. (Basically it boils down to implement a wrapper like this: https://github.com/maumueller/pqtable/blob/master/wrapper/wrapper.cpp. We are still working on making it easier accessible.)
I noticed that while there is support for top-k queries, there doesn't seem to be a parameter to improve the quality of the results. (I think I read something about that in your paper, but I couldn't find it as an option in your code.) E.g., I was expecting to see an option that gets the top-k' data points from the hash tables and then chooses the k closest of them through exact distance computations.
Since this option is missing, we only get a single result for each dataset, which ranges in quality quite a bit depending on the dataset. Would be great to have an option that affects the result quality, maybe as sketched above?
Any thoughts?
Best, Martin
Hi @maumueller,
PQTable doesn't have any runtime parameters. This is intentionally designed because I don't want to bother users with lots of parameters :)
As you suggested, late checking through a comparison to the original vectors is one direction. But currently I don't plan to do so because managing the original vectors takes an additional memory space. PQTable was originally developed in order to handle billion-scale data, and maintaining billion-scale original vectors requires prohibitive memory cost. Switching the search w/ late-checking for million-scale data and the search w/o late-checking for billion-sale data can be a solution, but the design would a bit difficult (PR is welcome)
As you pointed out, I guess only a single dot can be plotted for each dataset if each line line is drawn by tweaking a runtime parameter. I'm sorry for that.
By the way I'm still busy and cannot start to implement a python wrapper of PQTable. Honestly saying, PQTable is my previous method, and now I'm implementing a new one, that is with a full python interface. Could you wait for the new one? (Of cause, please feel free to play arround my c++ impl of PQTable :) but I'd like to focus the new one in terms of python binding)
By the way I'm still busy and cannot start to implement a python wrapper of PQTable. Honestly saying, PQTable is my previous method, and now I'm implementing a new one, that is with a full python interface. Could you wait for the new one? (Of cause, please feel free to play arround my c++ impl of PQTable :) but I'd like to focus the new one in terms of python binding)
Sure! As I said, I just used it to the test our wrapper methods. Thanks for the clarification!