yukang
yukang
Hi, For training, to reproduce, please disable the gt sampling augmentation in the last 5 epochs, this is a detailed trick, listed in the implementation details. For testing, sorry for...
Yes, I think so.
Hi, Thanks. I think your modification is right. Would you please have a check on other batch sizes, like 3 or 4?
Hi, Many thanks for your interest in our work. Let's take a step-by-step example to understand this flash-attention version implementation. (1) To understand the flash-attention implementation Taking batch size =...
你好, 您直接在inference的时候把forward的函数替换成这个就可以。 https://github.com/dvlab-research/LongLoRA/blob/39866afea5cdc7698f12c11236149727fdc22e31/llama_attn_replace_sft.py#L24 昨天有PR已经把这里改好了。 https://github.com/dvlab-research/LongLoRA/pull/114 Regards, Yukang Chen
@coranholmes 你好,forward_flashattn_inference 里面就是标准的attention 推理,不是S^2 attention. 用默认的就可以的。 @hxs91 你好,我确实还没有实现过S^2 attention + KV cache推理的代码,现在的forward_flashattn版本其实已经不需要padding或者考虑整除的问题,您可以尝试一下。
你好,因为您是用qlora进行的sft,请您尝试用 [inference-qlora.py](https://github.com/dvlab-research/LongLoRA/blob/main/inference-qlora.py) 进行inference,试一下效果。
你好, 请问您可以换成我用qlora训练的模型试一下,看看可以正常inference吗。来确认一下是fine-tune的问题,还是inference的问题。 https://huggingface.co/Yukang/LongAlpaca-7B-qlora-weights/tree/main 您下载这个weights之后,需要跑一下[merge](https://github.com/dvlab-research/LongLoRA#merge-lora-weight)的程序,来得到完整的model.
这个模型在我这边推理是正常的,你可以检查一下text.txt的长度是否超过了32k. 或者直接用 https://huggingface.co/Yukang/LongAlpaca-7B 这个模型,这个不需要merge lora weights。
Hi, Many thanks for your question. Actually, we have not conducted these experiments yet. Regards, Yukang Chen