New Model: OpenAI text embedding v3
Yes; OpenAI text-embedding-ada-002 is on the leaderboard
@Muennighoff how can i test new embedding v3 model? want to reprodcu results
There are scripts for the previous openai embedding models here: https://github.com/embeddings-benchmark/mtebscripts
If you modify it for the v3, would be cool if you can add it via pr!
@Muennighoff direct passing model name i think it will work.but im not able to run your default old version code may I know what kind of requirements.txt I need for this evaluation project? & which python version is best
error
Hmm I recommend you use Python >= 3.9 & maybe try upgrade mteb / datasets
@Muennighoff using py 3.9.18 but for openai why im getting this error?
can you provide me some info regarding how to run it? in readme its not clear
INFO:mteb.evaluation.MTEB:
## Evaluating 1 tasks:
───────────────────────────────────────────────────────────────────────────────────────────── Selected tasks ──────────────────────────────────────────────────────────────────────────────────────────────
Classification
- AmazonCounterfactualClassification, s2s, multilingual 1 / 4 langs
INFO:mteb.evaluation.MTEB:
********************** Evaluating AmazonCounterfactualClassification **********************
INFO:mteb.evaluation.MTEB:Loading dataset for AmazonCounterfactualClassification
/home/akash/anaconda3/envs/newenv/lib/python3.9/site-packages/huggingface_hub/repocard.py:105: UserWarning: Repo card metadata block was not found. Setting CardData to empty.
warnings.warn("Repo card metadata block was not found. Setting CardData to empty.")
ERROR:mteb.evaluation.MTEB:Error while evaluating AmazonCounterfactualClassification: Loading a dataset cached in a LocalFileSystem is not supported.
Traceback (most recent call last):
File "/home/akash/LANCEDB/mtebscripts-main/run_array_openaiv2.py", line 226, in <module>
main(args)
File "/home/akash/LANCEDB/mtebscripts-main/run_array_openaiv2.py", line 222, in main
evaluation.run(model, output_folder=f"results/{model_name}", batch_size=args.batchsize, eval_splits=eval_splits, corpus_chunk_size=10000)
File "/home/akash/anaconda3/envs/newenv/lib/python3.9/site-packages/mteb/evaluation/MTEB.py", line 289, in run
raise e
File "/home/akash/anaconda3/envs/newenv/lib/python3.9/site-packages/mteb/evaluation/MTEB.py", line 261, in run
task.load_data(eval_splits=task_eval_splits)
File "/home/akash/anaconda3/envs/newenv/lib/python3.9/site-packages/mteb/abstasks/MultilingualTask.py", line 25, in load_data
self.dataset[lang] = datasets.load_dataset(
File "/home/akash/anaconda3/envs/newenv/lib/python3.9/site-packages/datasets/load.py", line 2149, in load_dataset
ds = builder_instance.as_dataset(split=split, verification_mode=verification_mode, in_memory=keep_in_memory)
File "/home/akash/anaconda3/envs/newenv/lib/python3.9/site-packages/datasets/builder.py", line 1173, in as_dataset
raise NotImplementedError(f"Loading a dataset cached in a {type(self._fs).__name__} is not supported.")
NotImplementedError: Loading a dataset cached in a LocalFileSystem is not supported.
This looks like an issue not with MTEB but datasets, maybe try https://stackoverflow.com/questions/77433096/notimplementederror-loading-a-dataset-cached-in-a-localfilesystem-is-not-suppor
same error getting . i wanted to do it online mode, not offline
most of OpenAI's models are included on the benchmark and we supply an implementation for evaluating it on MTEB within the models folder. Will close this for now, but do feel free to re-open.