LongLM llama3 is not working.

llama3 is not working.

Open rayjang opened this issue 1 year ago • 1 comments

I followed your direction like the below to apply selfextend to llama3 """ [04/19/2024]:💡 We added the support for LLama-3 with transformers==4.40. To use it with transformers==4.40, you may change the file name of Llama_4_40.py to Llama.py to replace the existing patch file. """

I got this error. """

Exception Traceback (most recent call last) Cell In[12], line 4 2 group_size = 5 3 window_size = 1024 ----> 4 SelfExtend.apply(model, group_size, window_size, enable_flash_attention=True)#, flash_attention_impl='flash_attn') 5 model.eval()

File /home/ubuntu/reports/SelfExtend.py:109, in apply(loaded_model, group_size, window_size, enable_flash_attention, scale_base, flash_attention_impl) 107 print("Using triton flash self_extend!!") 108 if (not modifed): --> 109 raise Exception(f"Failed to modify the attention method of {arch_name}") 110 else: 111 raise Exception(f"Need to set the flash_attention_impl to 'flash_attn' or 'triton'.")

Exception: Failed to modify the attention method of LlamaForCausalLM """

how to fix it?

Apr 26 '24 08:04 rayjang

I followed your direction like the below to apply selfextend to llama3

"""

[04/19/2024]:💡 We added the support for LLama-3 with transformers==4.40. To use it with transformers==4.40, you may change the file name of Llama_4_40.py to Llama.py to replace the existing patch file.

"""

I got this error.

"""

Exception Traceback (most recent call last)

Cell In[12], line 4
  2 group_size = 5

  3 window_size = 1024
----> 4 SelfExtend.apply(model, group_size, window_size, enable_flash_attention=True)#, flash_attention_impl='flash_attn')
  5 model.eval()
File /home/ubuntu/reports/SelfExtend.py:109, in apply(loaded_model, group_size, window_size, enable_flash_attention, scale_base, flash_attention_impl)
107     print("Using triton flash self_extend!!")

108     if (not modifed):
--> 109 raise Exception(f"Failed to modify the attention method of {arch_name}")
110 else:

111     raise Exception(f"Need to set the flash_attention_impl to 'flash_attn' or 'triton'.")
Exception: Failed to modify the attention method of LlamaForCausalLM

"""

how to fix it?

This Exception is for the case: the targeted instance has no designated attention module within it. It may be caused by: You load the model without flash attention but set enable_flash_attention = True, or the reverse case.

If possible, you may check the attention module's name by simple print(model) before calling SelfExtend.apply

Apr 26 '24 15:04 Mooler0410

LongLM LongLM copied to clipboard

llama3 is not working.

I got this error. """

LongLM
LongLM copied to clipboard