pqtable icon indicating copy to clipboard operation
pqtable copied to clipboard

Python wrappers / inclusion into ANN-Benchmarks

Open maumueller opened this issue 7 years ago • 5 comments

Hi!

I just found your paper and repo, nice work! We are running a fairly large benchmark of ANN algorithms (https://github.com/erikbern/ann-benchmarks) and would like to include your implementation!

It would be easiest to include your algorithm, if you would provide Python wrappers to your code, using for example https://github.com/pybind/pybind11. In addition, this could draw many more users to your algorithm, since many users of ann algorithms code in Python.

Are there any plans to include a python wrapper? I could write the wrapper from our benchmark afterwards. Moreover, we also have a development branch where we allow inclusion from other programming languages by means of implementing a protocol. The development branch is here https://github.com/maumueller/ann-benchmarks and wrappers look like this https://github.com/maumueller/ann-benchmarks/blob/master/install/lib-dolphinn.cpp.

Best, Martin

maumueller avatar Dec 08 '17 08:12 maumueller

Hi @maumueller,

Thanks for inviting me and I'd love to include my algorithm in your benchmark! I guess it'd be best to write Python wrappers by pybind11. As I'm busy this month, I'll work on it in Jan or Feb next year.

Best, Yusuke

matsui528 avatar Dec 09 '17 02:12 matsui528

Hi @matsui528,

great to hear! Please keep me up-to-date and ping me if you have questions w.r.t. the benchmark.

Best, Martin

maumueller avatar Dec 10 '17 11:12 maumueller

Hi @matsui528,

a small update here: We are currently including support for other languages in ann-benchmarks and I chose your implementation to play around a little bit. (Basically it boils down to implement a wrapper like this: https://github.com/maumueller/pqtable/blob/master/wrapper/wrapper.cpp. We are still working on making it easier accessible.)

I noticed that while there is support for top-k queries, there doesn't seem to be a parameter to improve the quality of the results. (I think I read something about that in your paper, but I couldn't find it as an option in your code.) E.g., I was expecting to see an option that gets the top-k' data points from the hash tables and then chooses the k closest of them through exact distance computations.

Since this option is missing, we only get a single result for each dataset, which ranges in quality quite a bit depending on the dataset. Would be great to have an option that affects the result quality, maybe as sketched above?

Any thoughts?

Best, Martin

maumueller avatar Feb 24 '18 10:02 maumueller

Hi @maumueller,

PQTable doesn't have any runtime parameters. This is intentionally designed because I don't want to bother users with lots of parameters :)

As you suggested, late checking through a comparison to the original vectors is one direction. But currently I don't plan to do so because managing the original vectors takes an additional memory space. PQTable was originally developed in order to handle billion-scale data, and maintaining billion-scale original vectors requires prohibitive memory cost. Switching the search w/ late-checking for million-scale data and the search w/o late-checking for billion-sale data can be a solution, but the design would a bit difficult (PR is welcome)

As you pointed out, I guess only a single dot can be plotted for each dataset if each line line is drawn by tweaking a runtime parameter. I'm sorry for that.

By the way I'm still busy and cannot start to implement a python wrapper of PQTable. Honestly saying, PQTable is my previous method, and now I'm implementing a new one, that is with a full python interface. Could you wait for the new one? (Of cause, please feel free to play arround my c++ impl of PQTable :) but I'd like to focus the new one in terms of python binding)

matsui528 avatar Feb 25 '18 17:02 matsui528

By the way I'm still busy and cannot start to implement a python wrapper of PQTable. Honestly saying, PQTable is my previous method, and now I'm implementing a new one, that is with a full python interface. Could you wait for the new one? (Of cause, please feel free to play arround my c++ impl of PQTable :) but I'd like to focus the new one in terms of python binding)

Sure! As I said, I just used it to the test our wrapper methods. Thanks for the clarification!

maumueller avatar Feb 25 '18 17:02 maumueller