LongLM icon indicating copy to clipboard operation
LongLM copied to clipboard

llama3 is not working.

Open rayjang opened this issue 10 months ago • 1 comments

I followed your direction like the below to apply selfextend to llama3 """ [04/19/2024]:💡 We added the support for LLama-3 with transformers==4.40. To use it with transformers==4.40, you may change the file name of Llama_4_40.py to Llama.py to replace the existing patch file. """

I got this error. """

Exception Traceback (most recent call last) Cell In[12], line 4 2 group_size = 5 3 window_size = 1024 ----> 4 SelfExtend.apply(model, group_size, window_size, enable_flash_attention=True)#, flash_attention_impl='flash_attn') 5 model.eval()

File /home/ubuntu/reports/SelfExtend.py:109, in apply(loaded_model, group_size, window_size, enable_flash_attention, scale_base, flash_attention_impl) 107 print("Using triton flash self_extend!!") 108 if (not modifed): --> 109 raise Exception(f"Failed to modify the attention method of {arch_name}") 110 else: 111 raise Exception(f"Need to set the flash_attention_impl to 'flash_attn' or 'triton'.")

Exception: Failed to modify the attention method of LlamaForCausalLM """

how to fix it?

rayjang avatar Apr 26 '24 08:04 rayjang

I followed your direction like the below to apply selfextend to llama3

"""

[04/19/2024]:💡 We added the support for LLama-3 with transformers==4.40. To use it with transformers==4.40, you may change the file name of Llama_4_40.py to Llama.py to replace the existing patch file.

"""

I got this error.

"""


Exception Traceback (most recent call last)

Cell In[12], line 4

  2 group_size = 5

  3 window_size = 1024

----> 4 SelfExtend.apply(model, group_size, window_size, enable_flash_attention=True)#, flash_attention_impl='flash_attn')

  5 model.eval()

File /home/ubuntu/reports/SelfExtend.py:109, in apply(loaded_model, group_size, window_size, enable_flash_attention, scale_base, flash_attention_impl)

107     print("Using triton flash self_extend!!")

108     if (not modifed):

--> 109 raise Exception(f"Failed to modify the attention method of {arch_name}")

110 else:

111     raise Exception(f"Need to set the flash_attention_impl to 'flash_attn' or 'triton'.")

Exception: Failed to modify the attention method of LlamaForCausalLM

"""

how to fix it?

This Exception is for the case: the targeted instance has no designated attention module within it. It may be caused by: You load the model without flash attention but set enable_flash_attention = True, or the reverse case.

If possible, you may check the attention module's name by simple print(model) before calling SelfExtend.apply

Mooler0410 avatar Apr 26 '24 15:04 Mooler0410