mteb Evaluation of recent LLMs, like Llama 2

Hi there,

I am quite curious to see the performance of recent LLMs such as Llama 2 in terms of embedding quality. I haven't found anything on it on MTEB. The sentence_transformers seems to be a bit outdated and not include Llama 2.

Are there been work on the topic? Would be interested in contributing if you could provide some pointers

Aug 14 '23 22:08 dhuynh95

Hey! I think using the raw Llama would not perform very well, you will likely need to fine-tune it. You can check the SGPT paper for background on this: https://arxiv.org/abs/2202.08904

I am planning to fine-tune Llama-2 soon using the SGPT-BE recipe, just haven't had the time thus far. Feel free to try it out!

Otherwise, you can ofc also try evaluating it without fine-tuning using e.g. the weighted mean pooling from sgpt, I doubt it will be competitive though.

Aug 15 '23 11:08 Muennighoff

Hello! From https://github.com/embeddings-benchmark/mteb/issues/129 it seems that @DJT777 is maybe trying it as well? 👀

Aug 15 '23 13:08 NouamaneTazi

@NouamaneTazi @Muennighoff @dhuynh95

Yes, I'm currently in the stages of doing some preliminary evaluations of the base Llama 2 models' embedding spaces, with the end goal of creating an embedding generator from Llama 2. I'm very interested in developing an LLM based embedding model with Llama 2.

Aug 16 '23 18:08 DJT777

Seems like this issue has gotten stale, will close it, but do feel free to reopen it if needed

Jun 05 '24 18:06 KennethEnevoldsen