big-ann-benchmarks icon indicating copy to clipboard operation
big-ann-benchmarks copied to clipboard

is this a permanent fork of ann-benchmark?

Open erikbern opened this issue 2 years ago • 4 comments

It feels a bit weird to see a lot of activity on this repo rather than trying to contribute to the original one https://github.com/erikbern/ann-benchmarks

Is the ambition to merge it back into the main repo? Or is this just a short-lived repo anyway?

I'm happy to donate my code to something more neutral (eg we can set up a neutral github.com organzation rather than have the code under my username). Seems like it would be beneficial to to not diverge too far.

(also felt a bit weird that no one told me about this – I found out about it randomly)

@maumueller wdyt?

erikbern avatar Jan 21 '22 14:01 erikbern

I'm not sure if merging is a good idea. Most of the dataset/index infrastructure developed here is completely useless for in-memory benchmarks. There is a shared core, but it differs quite a lot in the details. Extracting this shared core and using it in module-like style for the different projects is possible, but this difficult to maintain as well.

(also felt a bit weird that no one told me about this – I found out about it randomly)

You were CC'ed in the initial mails in March last year, but there didn't seem to be any interest from your side.

maumueller avatar Jan 21 '22 14:01 maumueller

Admittedly I haven't been the best maintainer of ann-benchmarks, so I sort of understand why we ended up in this situation.

But surely there must be benefits of keeping it together? At the least keeping all results in one place seems highly beneficial. But I'm also thinking about the next 5 years of maintenance cost of keeping two code bases up to date with new algorithms etc. But maybe the time frame of this repo is more imminent.

Not trying to come across as negative – I think it's great that you're doing these experiments, and I'm happy my benchmarks provided a starting point for all of this. I'm posing this interest more from the perspective of what's best for the long term value to people interested in ANN.

erikbern avatar Jan 21 '22 16:01 erikbern

I think moving everything to an organization would be a good starting point, what do you think @harsha-simhadri?

@erikbern I think the main problem is that these repository target very different implementations. In the time required to build a single index for a single implementation here, we can basically run all of ann-benchmarks :-) Some of these issues (index loading/saving, dataset handling) differ quite a lot and it would make for a very messy joint version. None of the implementations in ann-benchmarks work here out of the box, and none of the implementations here improve the status quo in ann-benchmarks.

I could draft what would be necessary in a central core and what is better off in individual repos if we think that might be a way forward.

maumueller avatar Jan 24 '22 11:01 maumueller

Martin, I defer to you on whether these are mergeable. To add to your points on where the functionality of the code is different, for the next iteration, I want to work on some distributed orchestration of launching jobs and retrieving results.

harsha-simhadri avatar Feb 15 '22 05:02 harsha-simhadri