ChenxinAn
ChenxinAn
传入的attention mask和encoder输出形状不匹配 你自己check一下吧
Ooops! There seems to be a bug in ``torch_bleu`` function. I only tested the code with ``ngram = 2`` . Thank you very much for your suggestions, and I will...
I have updated the code! Thank you again for your effort !!
Yes! We use `flash_attn_func` for better efficiency and simplicity. Changing to `flash_attn_varlen_func` should not be difficult. If you encounter any difficulties, please feel free to leave a comment.
Hi, thank you for your attention! DCA can be used for almost all LLMs released on Hugging Face. If you find it challenging for a specific model, please feel free...
I do not have strong background in CUDA programming, so it might take a long time...
Please add `attn_implementation="flash_attention_2"` when loading the model [Line 265](https://github.com/HKUNLP/ChunkLlama/blob/main/fine-tune/train_chunkllama_16k.py#L265)
This error is caused by `LlamaFlashAttention2.forward` not being correctly replaced by the new forward function.
The code for these baselines is coming soon.
Thank you for bringing this to our attention. Unfortunately, the current version of vLLM does not support the return of attention scores. However, we are pleased to inform you that...