ChenxinAn comments

Results 26 comments of


                                            ChenxinAn

Eval时的报错

传入的attention mask和encoder输出形状不匹配你自己check一下吧

function `torch_bleu` producing inappropriate results

Ooops! There seems to be a bug in ``torch_bleu`` function. I only tested the code with ``ngram = 2`` . Thank you very much for your suggestions, and I will...

function `torch_bleu` producing inappropriate results

I have updated the code! Thank you again for your effort !!

Does it supprts batch inference?

Yes! We use `flash_attn_func` for better efficiency and simplicity. Changing to `flash_attn_varlen_func` should not be difficult. If you encounter any difficulties, please feel free to leave a comment.

Tanks for your Research, Can you provide a CUDA version code?

Hi, thank you for your attention! DCA can be used for almost all LLMs released on Hugging Face. If you find it challenging for a specific model, please feel free...

Tanks for your Research, Can you provide a CUDA version code?

I do not have strong background in CUDA programming, so it might take a long time...

finetune error

Please add `attn_implementation="flash_attention_2"` when loading the model [Line 265](https://github.com/HKUNLP/ChunkLlama/blob/main/fine-tune/train_chunkllama_16k.py#L265)

finetune error

This error is caused by `LlamaFlashAttention2.forward` not being correctly replaced by the new forward function.

info about compared models in your paper

The code for these baselines is coming soon.

How do I use it in vllm deployment

Thank you for bringing this to our attention. Unfortunately, the current version of vLLM does not support the return of attention scores. However, we are pleased to inform you that...