LongLM
LongLM copied to clipboard
llama3 is not working.
I followed your direction like the below to apply selfextend to llama3 """ [04/19/2024]:💡 We added the support for LLama-3 with transformers==4.40. To use it with transformers==4.40, you may change the file name of Llama_4_40.py to Llama.py to replace the existing patch file. """
I got this error. """
Exception Traceback (most recent call last) Cell In[12], line 4 2 group_size = 5 3 window_size = 1024 ----> 4 SelfExtend.apply(model, group_size, window_size, enable_flash_attention=True)#, flash_attention_impl='flash_attn') 5 model.eval()
File /home/ubuntu/reports/SelfExtend.py:109, in apply(loaded_model, group_size, window_size, enable_flash_attention, scale_base, flash_attention_impl) 107 print("Using triton flash self_extend!!") 108 if (not modifed): --> 109 raise Exception(f"Failed to modify the attention method of {arch_name}") 110 else: 111 raise Exception(f"Need to set the flash_attention_impl to 'flash_attn' or 'triton'.")
Exception: Failed to modify the attention method of LlamaForCausalLM """
how to fix it?
I followed your direction like the below to apply selfextend to llama3
"""
[04/19/2024]:💡 We added the support for LLama-3 with transformers==4.40. To use it with transformers==4.40, you may change the file name of Llama_4_40.py to Llama.py to replace the existing patch file.
"""
I got this error.
"""
Exception Traceback (most recent call last)
Cell In[12], line 4
2 group_size = 5 3 window_size = 1024
----> 4 SelfExtend.apply(model, group_size, window_size, enable_flash_attention=True)#, flash_attention_impl='flash_attn')
5 model.eval()
File /home/ubuntu/reports/SelfExtend.py:109, in apply(loaded_model, group_size, window_size, enable_flash_attention, scale_base, flash_attention_impl)
107 print("Using triton flash self_extend!!") 108 if (not modifed):
--> 109 raise Exception(f"Failed to modify the attention method of {arch_name}")
110 else: 111 raise Exception(f"Need to set the flash_attention_impl to 'flash_attn' or 'triton'.")
Exception: Failed to modify the attention method of LlamaForCausalLM
"""
how to fix it?
This Exception is for the case: the targeted instance has no designated attention module within it. It may be caused by: You load the model without flash attention but set enable_flash_attention = True, or the reverse case.
If possible, you may check the attention module's name by simple print(model) before calling SelfExtend.apply