suggestomatic icon indicating copy to clipboard operation
suggestomatic copied to clipboard

Anonymized real world data set

Open derwiki opened this issue 13 years ago • 4 comments

The test data is great, but makes it hard to test performance at the scale Suggestomatic was intended for. Internally at Causes we have a test set of about 900m records, we should obfuscate group and user ids and release the large scale data set.

derwiki avatar Aug 02 '11 19:08 derwiki

Anonymizing data is hard. It's likely easier to just generate a larger test set with similar distributions.

kristjan avatar Aug 02 '11 19:08 kristjan

The Comment & Close button is far to clickable.

kristjan avatar Aug 02 '11 19:08 kristjan

Generating a large test set with similar distributions is also hard :)

derwiki avatar Aug 02 '11 19:08 derwiki

Looking forward to chewing on it!

jl avatar Aug 02 '11 23:08 jl