scandinavian-embedding-benchmark icon indicating copy to clipboard operation
scandinavian-embedding-benchmark copied to clipboard

A Scandinavian Benchmark for sentence embeddings

Results 26 scandinavian-embedding-benchmark issues
Sort by recently updated
recently updated
newest added

https://danlp-alexandra.readthedocs.io/en/latest/docs/datasets.html#ddisco

dataset

https://github.com/kuhumcst/danish-semantic-reasoning-benchmark

dataset

https://www.mixedbread.ai/blog/mxbai-embed-large-v1

model

Extending the dataset to other Scandinavian languages **These resources should be checked before implementing on whether they are translated or not:** - Greenlandic - Danish-Greenlandic - Greenlandic news - Icelandic...

E.g. for ScaLA it is natural text, but synthetically augmented (and humanly evaluated). Other construction methods could include translations. Others could be found or expert-generated. It is probably reasonable to...

https://huggingface.co/Salesforce/SFR-Embedding-Mistral

model

https://huggingface.co/BAAI/bge-m3

model

Add metadata on socioeconomic status to the datasets.

documentation