llm2vec Create llm2vec based on the new llama-3-8b

Can't wait to see the results :)

Apr 20 '24 08:04 rodion-m

We are on it :)

Apr 20 '24 09:04 sivareddyg

Llama 3 is full of surprises, please stay tuned. We are analyzing the results we have. We will post an update early next week as unraveling some mysteries is taking time.

Apr 24 '24 17:04 sivareddyg

💥💥💥 llm2vec-llama-3 is coming out tomorrow 💥💥💥

Apr 29 '24 23:04 sivareddyg

Is it tomorrow yet

Yes its tomorrow! Its out!

Apr 30 '24 15:04 matbee-eth

Llama-3 has been added to the model list

Apr 30 '24 20:04 vaibhavad

Thanks a lot! Btw, the first and the third link go to mistral model:

Apr 30 '24 20:04 rodion-m

Fixed, thanks!

Apr 30 '24 21:04 vaibhavad

@vaibhavad I have prepared an llm2vec version for Qwen2 but there are some problems, I wonder if you can give me the process to prepare code for a new model?

May 01 '24 02:05 Iambestfeed

Thank you! Did you try to score this model with MTEB?

May 01 '24 06:05 rodion-m

We did. Here is a twitter/X thread summarizing our findings. All MTEB scores are also present in our Huggingface repo of these models. The results should be updated on the MTEB leaderboard in a couple of days.

May 01 '24 15:05 vaibhavad

@vaibhavad I have prepared an llm2vec version for Qwen2 but there are some problems, I wonder if you can give me the process to prepare code for a new model?

@Iambestfeed Can you share you PR or working branch?

May 01 '24 15:05 vaibhavad

@vaibhavad Hmmm, you can check this repo https://github.com/Iambestfeed/qwen2vec

Colab here

Bugs:

/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py:521: FutureWarning: `is_torch_tpu_available` is deprecated and will be removed in 4.41.0. Please use the `is_torch_xla_available` instead.
  warnings.warn(
[INFO|trainer.py:2048] 2024-05-01 15:31:52,965 >> ***** Running training *****
[INFO|trainer.py:2049] 2024-05-01 15:31:52,965 >>   Num examples = 822
[INFO|trainer.py:2050] 2024-05-01 15:31:52,965 >>   Num Epochs = 3
[INFO|trainer.py:2051] 2024-05-01 15:31:52,965 >>   Instantaneous batch size per device = 1
[INFO|trainer.py:2054] 2024-05-01 15:31:52,965 >>   Total train batch size (w. parallel, distributed & accumulation) = 1
[INFO|trainer.py:2055] 2024-05-01 15:31:52,965 >>   Gradient Accumulation steps = 1
[INFO|trainer.py:2056] 2024-05-01 15:31:52,965 >>   Total optimization steps = 2,466
[INFO|trainer.py:2057] 2024-05-01 15:31:52,968 >>   Number of trainable parameters = 7,569,408
  0% 0/2466 [00:00<?, ?it/s][WARNING|logging.py:329] 2024-05-01 15:31:53,047 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
Traceback (most recent call last):
  File "/content/qwen2vec/experiments/run_mntp.py", line 985, in <module>
    main()
  File "/content/qwen2vec/experiments/run_mntp.py", line 933, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1859, in train
    return inner_training_loop(
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2203, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3138, in training_step
    loss = self.compute_loss(model, inputs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3161, in compute_loss
    outputs = model(**inputs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/qwen2/modeling_qwen2.py", line 1169, in forward
    outputs = self.model(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 563, in forward
    return self.get_base_model()(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/qwen2/modeling_qwen2.py", line 1000, in forward
    if attention_mask is not None and self._attn_implementation == "flash_attention_2" and use_cache:
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1688, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'Qwen2BiModel' object has no attribute '_attn_implementation'. Did you mean: '_autoset_attn_implementation'?
  0% 0/2466 [00:00<?, ?it/s]

May 01 '24 15:05 Iambestfeed

@Iambestfeed It looks like your implementation is similar to Llama, however, looking to modeling_qwen2.py at transformers library, it seems the implementation is similar to Mistral.

Different models differ on how they implementation causal attention mechanism Hence, the workflow of creating a new bi model requires understanding the native implementation and then overriding the relevant parts.

I have raised a PR on your repository that implements bidirectional attention for Qwen2 for transformers >= 4.40.0 - https://github.com/Iambestfeed/qwen2vec/pull/1 Feel free to merge it. I have tested it and it works on my end.

May 01 '24 16:05 vaibhavad

Thank you so much, it seems that I made a mistake when reading Qwen's technical report (they tweaked the Llama architecture but I forgot the tweaking part). If you want to prepare for finetune contrastive learning with triplet-data, I think I can assist you. Contact me at Twitter if you feel you can let me contribute more to this project. Tks

May 02 '24 00:05 Iambestfeed

Hello, Sorry for bothering you! Can I get me some guidance to use Qwen2Vec(Can you share the colab link or ckpt)? I want to do some experiment with Chinese LLM Thanks a lot!

May 02 '24 17:05 zhang373

Hello, Sorry for bothering you! Can I get me some guidance to use Qwen2Vec(Can you share the colab link or ckpt)? I want to do some experiment with Chinese LLM Thanks a lot!

Hmmm, you can check this colab. Please select 1 other dataset in the config file because the wiki will overflow RAM

May 03 '24 05:05 Iambestfeed

Closing as it is stale. Feel free to re-open if there are any additional questions related to this issue

May 13 '24 20:05 vaibhavad

llm2vec llm2vec copied to clipboard

Create llm2vec based on the new llama-3-8b

llm2vec
llm2vec copied to clipboard