ann-benchmarks
ann-benchmarks copied to clipboard
Random datasets too easy
The random blobs created from sklearn seem to be far too easy to solve, see screenshot below. I propose to bring back the old random datasets that use https://github.com/maumueller/random-inputs/. For me the easiest way would be to create them, put them on a website, add a function to datasets.py
that fetches it and transforms it correctly.
ok – i can also try to increase the number of blobs?
i think everything else equals, it's good to have a reproducible pipeline. if we end up using your algorithm that's fine, but is it hard to reimplement in python/numpy?
thanks for running this experiment btw!!