ColossalAI
ColossalAI copied to clipboard
[BUG]: KeyError: 'Cache only has 0 layers, attempted to access layer with index 0' in applications/Colossal-LLaMA-2
🐛 Describe the bug
following: https://github.com/hpcaitech/ColossalAI/tree/main/applications/Colossal-LLaMA-2 but get error:
Flash-attention enabled successfully Model params: 6.28 B Booster init max device memory: 38593.54 MB Booster init max CPU memory: 29757.34 MB Epoch 0: 0% 0/9198 [00:00<?, ?it/s]Traceback (most recent call last): File "train.py", line 422, in
main() File "train.py", line 346, in main batch_output = model(**batch) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.8/site-packages/colossalai/booster/plugin/low_level_zero_plugin.py", line 65, in forward return super().forward(*args, **kwargs) File "/opt/conda/lib/python3.8/site-packages/colossalai/interface/model.py", line 25, in forward return self.module(*args, **kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 1179, in forward outputs = self.model( File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 1018, in forward layer_outputs = decoder_layer( File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 733, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/work/test_lm/Update_ColossalAI/ColossalAI/applications/Colossal-LLaMA-2/colossal_llama2/utils/flash_attention_patch.py", line 129, in attention_forward past_kv_len = past_key_value[0].shape[-2] File "/opt/conda/lib/python3.8/site-packages/transformers/cache_utils.py", line 82, in getitem raise KeyError(f"Cache only has {len(self)} layers, attempted to access layer with index {layer_idx}") KeyError: 'Cache only has 0 layers, attempted to access layer with index 0' Epoch 0: 0% 0/9198 [00:00<?, ?it/s]
Then after searching on google, that's due to the conflict with flash_atten. Then disable the flash_atten in the command, it at last can work. But can't use flash_atten How to solve it ? thanks
Environment
My environment CUDA: 11.8 pyTorch: 1.13 colossalai: 0.3.5 transformers: 4.38.1
Could you please try to downgrade the version of transformers? For example, 4.33.3. There is an update of transformers in 4.36.xx.
@TongLi3701 I am facing the same problem using transformers 4.36.0 and colossalai branch feature/update-transformers, which targets transformers 4.36.0.