ann-benchmarks icon indicating copy to clipboard operation
ann-benchmarks copied to clipboard

Random datasets too easy

Open maumueller opened this issue 7 years ago • 2 comments

The random blobs created from sklearn seem to be far too easy to solve, see screenshot below. I propose to bring back the old random datasets that use https://github.com/maumueller/random-inputs/. For me the easiest way would be to create them, put them on a website, add a function to datasets.py that fetches it and transforms it correctly.

image

maumueller avatar Nov 23 '17 11:11 maumueller

ok – i can also try to increase the number of blobs?

i think everything else equals, it's good to have a reproducible pipeline. if we end up using your algorithm that's fine, but is it hard to reimplement in python/numpy?

erikbern avatar Nov 23 '17 19:11 erikbern

thanks for running this experiment btw!!

erikbern avatar Nov 23 '17 19:11 erikbern