scandinavian-embedding-benchmark
scandinavian-embedding-benchmark copied to clipboard
reducing the size of large datasets
If it is possible to reduce the size of some datasets without changing the performance too much it would be great to ensure that the benchmark runs faster.
I am especially thinking of ScaLA, Da Political comments, as well as massive intent and massive scenario.
@x-tabdeveloping would you think this is reasonable as well?
Hmm yeah it would be nice if we could make it faster somehow, especially if we're planning on bootstrapping stuff, then it's a really good idea.
Well only bootstrapping the evaluation (not the encoding) - but agree
Notably better at the fixes in #130 (doesn't actually change the size)