scandinavian-embedding-benchmark reducing the size of large datasets

reducing the size of large datasets

Open KennethEnevoldsen opened this issue 1 year ago • 3 comments

If it is possible to reduce the size of some datasets without changing the performance too much it would be great to ensure that the benchmark runs faster.

I am especially thinking of ScaLA, Da Political comments, as well as massive intent and massive scenario.

@x-tabdeveloping would you think this is reasonable as well?

Jan 25 '24 07:01 KennethEnevoldsen

Hmm yeah it would be nice if we could make it faster somehow, especially if we're planning on bootstrapping stuff, then it's a really good idea.

Jan 31 '24 08:01 x-tabdeveloping

Well only bootstrapping the evaluation (not the encoding) - but agree

Jan 31 '24 10:01 KennethEnevoldsen

Notably better at the fixes in #130 (doesn't actually change the size)

Feb 06 '24 07:02 KennethEnevoldsen

scandinavian-embedding-benchmark scandinavian-embedding-benchmark copied to clipboard

reducing the size of large datasets

scandinavian-embedding-benchmark
scandinavian-embedding-benchmark copied to clipboard