scandinavian-embedding-benchmark icon indicating copy to clipboard operation
scandinavian-embedding-benchmark copied to clipboard

reducing the size of large datasets

Open KennethEnevoldsen opened this issue 1 year ago • 3 comments

If it is possible to reduce the size of some datasets without changing the performance too much it would be great to ensure that the benchmark runs faster.

I am especially thinking of ScaLA, Da Political comments, as well as massive intent and massive scenario.

@x-tabdeveloping would you think this is reasonable as well?

KennethEnevoldsen avatar Jan 25 '24 07:01 KennethEnevoldsen

Hmm yeah it would be nice if we could make it faster somehow, especially if we're planning on bootstrapping stuff, then it's a really good idea.

x-tabdeveloping avatar Jan 31 '24 08:01 x-tabdeveloping

Well only bootstrapping the evaluation (not the encoding) - but agree

KennethEnevoldsen avatar Jan 31 '24 10:01 KennethEnevoldsen

Notably better at the fixes in #130 (doesn't actually change the size)

KennethEnevoldsen avatar Feb 06 '24 07:02 KennethEnevoldsen