llm2vec
llm2vec copied to clipboard
Create llm2vec based on the new llama-3-8b
Can't wait to see the results :)
We are on it :)
Llama 3 is full of surprises, please stay tuned. We are analyzing the results we have. We will post an update early next week as unraveling some mysteries is taking time.
💥💥💥 llm2vec-llama-3 is coming out tomorrow 💥💥💥
Is it tomorrow yet
Yes its tomorrow! Its out!
Llama-3 has been added to the model list
Thanks a lot! Btw, the first and the third link go to mistral model:
Fixed, thanks!
@vaibhavad I have prepared an llm2vec version for Qwen2 but there are some problems, I wonder if you can give me the process to prepare code for a new model?
Thank you! Did you try to score this model with MTEB?
We did. Here is a twitter/X thread summarizing our findings. All MTEB scores are also present in our Huggingface repo of these models. The results should be updated on the MTEB leaderboard in a couple of days.
@vaibhavad I have prepared an llm2vec version for Qwen2 but there are some problems, I wonder if you can give me the process to prepare code for a new model?
@Iambestfeed Can you share you PR or working branch?
@vaibhavad Hmmm, you can check this repo https://github.com/Iambestfeed/qwen2vec
Colab here
Bugs:
/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py:521: FutureWarning: `is_torch_tpu_available` is deprecated and will be removed in 4.41.0. Please use the `is_torch_xla_available` instead.
warnings.warn(
[INFO|trainer.py:2048] 2024-05-01 15:31:52,965 >> ***** Running training *****
[INFO|trainer.py:2049] 2024-05-01 15:31:52,965 >> Num examples = 822
[INFO|trainer.py:2050] 2024-05-01 15:31:52,965 >> Num Epochs = 3
[INFO|trainer.py:2051] 2024-05-01 15:31:52,965 >> Instantaneous batch size per device = 1
[INFO|trainer.py:2054] 2024-05-01 15:31:52,965 >> Total train batch size (w. parallel, distributed & accumulation) = 1
[INFO|trainer.py:2055] 2024-05-01 15:31:52,965 >> Gradient Accumulation steps = 1
[INFO|trainer.py:2056] 2024-05-01 15:31:52,965 >> Total optimization steps = 2,466
[INFO|trainer.py:2057] 2024-05-01 15:31:52,968 >> Number of trainable parameters = 7,569,408
0% 0/2466 [00:00<?, ?it/s][WARNING|logging.py:329] 2024-05-01 15:31:53,047 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
Traceback (most recent call last):
File "/content/qwen2vec/experiments/run_mntp.py", line 985, in <module>
main()
File "/content/qwen2vec/experiments/run_mntp.py", line 933, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1859, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2203, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3138, in training_step
loss = self.compute_loss(model, inputs)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3161, in compute_loss
outputs = model(**inputs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/qwen2/modeling_qwen2.py", line 1169, in forward
outputs = self.model(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 563, in forward
return self.get_base_model()(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/qwen2/modeling_qwen2.py", line 1000, in forward
if attention_mask is not None and self._attn_implementation == "flash_attention_2" and use_cache:
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1688, in __getattr__
raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'Qwen2BiModel' object has no attribute '_attn_implementation'. Did you mean: '_autoset_attn_implementation'?
0% 0/2466 [00:00<?, ?it/s]
@Iambestfeed It looks like your implementation is similar to Llama, however, looking to modeling_qwen2.py at transformers library, it seems the implementation is similar to Mistral.
Different models differ on how they implementation causal attention mechanism Hence, the workflow of creating a new bi model requires understanding the native implementation and then overriding the relevant parts.
I have raised a PR on your repository that implements bidirectional attention for Qwen2 for transformers >= 4.40.0 - https://github.com/Iambestfeed/qwen2vec/pull/1 Feel free to merge it. I have tested it and it works on my end.
Thank you so much, it seems that I made a mistake when reading Qwen's technical report (they tweaked the Llama architecture but I forgot the tweaking part). If you want to prepare for finetune contrastive learning with triplet-data, I think I can assist you. Contact me at Twitter if you feel you can let me contribute more to this project. Tks
Hello, Sorry for bothering you! Can I get me some guidance to use Qwen2Vec(Can you share the colab link or ckpt)? I want to do some experiment with Chinese LLM Thanks a lot!
Hello, Sorry for bothering you! Can I get me some guidance to use Qwen2Vec(Can you share the colab link or ckpt)? I want to do some experiment with Chinese LLM Thanks a lot!
Hmmm, you can check this colab. Please select 1 other dataset in the config file because the wiki will overflow RAM
Closing as it is stale. Feel free to re-open if there are any additional questions related to this issue