sgpt icon indicating copy to clipboard operation
sgpt copied to clipboard

does it support Korea and Japanese?

Open sz2three opened this issue 1 year ago • 1 comments

it supports Chinese, while does it also work for Korea and Japanese?

sz2three avatar Jun 14 '23 00:06 sz2three

You can check Section 4.4 of the MTEB paper (https://arxiv.org/pdf/2210.07316.pdf) where https://huggingface.co/bigscience/sgpt-bloom-7b1-msmarco is benchmarked on many languages incl. Korean & Japanese against other models. As it hasn't extensively seen them in pre-training it performs rather poorly on them.

You may want to use a different model for those languages (check e.g. this leaderboard to see what's best: https://huggingface.co/spaces/mteb/leaderboard for those languages).

Muennighoff avatar Jun 14 '23 07:06 Muennighoff