snowman
snowman copied to clipboard
Preload more Datasets
We should preload more datasets. As some datasets are quite large we should implement a feature to only add a reference to a dataset and download it during runtime with no further action required from the user.
- The WDC training dataset and gold standard for large-scale product matching
- Alaska Benchmark for Big Data Integration tasks
After implementing #112
Also syntethic data sets would be interesting. Maybe with custom configuration?
- Febrl
The datasets released as part of the SIGMOD contest today are included with v2.0.0.