yukang comments

Results 58 comments of


                                            yukang

About detection training gpu num

Hi, For training, to reproduce, please disable the gt sampling augmentation in the last 5 epochs, this is a detailed trick, listed in the implementation details. For testing, sorry for...

question about augmentation policys on own dataset

Yes, I think so.

When I set `per_device_train_batch_size=2`, the S2-Attn would not shift as expected

Hi, Thanks. I think your modification is right. Would you please have a check on other batch sizes, like 3 or 4?

LongLoRA + Flash Attention 2 causing illigal memory access

Hi, Many thanks for your interest in our work. Let's take a step-by-step example to understand this flash-attention version implementation. (1) To understand the flash-attention implementation Taking batch size =...

能给一份S^2 Attension推理的代码吗？

你好，您直接在inference的时候把forward的函数替换成这个就可以。 https://github.com/dvlab-research/LongLoRA/blob/39866afea5cdc7698f12c11236149727fdc22e31/llama_attn_replace_sft.py#L24 昨天有PR已经把这里改好了。 https://github.com/dvlab-research/LongLoRA/pull/114 Regards, Yukang Chen

能给一份S^2 Attension推理的代码吗？

@coranholmes 你好，forward_flashattn_inference 里面就是标准的attention 推理，不是S^2 attention. 用默认的就可以的。 @hxs91 你好，我确实还没有实现过S^2 attention + KV cache推理的代码，现在的forward_flashattn版本其实已经不需要padding或者考虑整除的问题，您可以尝试一下。

32k inference result is garbled

你好，因为您是用qlora进行的sft，请您尝试用 [inference-qlora.py](https://github.com/dvlab-research/LongLoRA/blob/main/inference-qlora.py) 进行inference，试一下效果。

32k inference result is garbled

你好，请问您可以换成我用qlora训练的模型试一下，看看可以正常inference吗。来确认一下是fine-tune的问题，还是inference的问题。 https://huggingface.co/Yukang/LongAlpaca-7B-qlora-weights/tree/main 您下载这个weights之后，需要跑一下[merge](https://github.com/dvlab-research/LongLoRA#merge-lora-weight)的程序，来得到完整的model.

32k inference result is garbled

这个模型在我这边推理是正常的，你可以检查一下text.txt的长度是否超过了32k. 或者直接用 https://huggingface.co/Yukang/LongAlpaca-7B 这个模型，这个不需要merge lora weights。

Qustions about dynamic NTK interpolation fine-tuning and non-linear interpolation methods

Hi, Many thanks for your question. Actually, we have not conducted these experiments yet. Regards, Yukang Chen