Kingsley

Results 80 comments of Kingsley

> cc [@Kuangdd01](https://github.com/Kuangdd01) could you verify this? check it later

I guess some configurations of origin `InternVL-Chat` have not been updated with the new feature of `video_processor`. A quick hack way: ``` 1. cd your_transformers_dir 2. git checkout -b internvl_dev...

Hi, 请问你改动源代码下的`qwen template`了吗,在我的环境下测试输出是正常的。 ``` top_p=0.01, temperate=0.01, torch=2.6.0+cu12.4 GPU=V100*4, llamafactory=0.9.3.dev0 ``` ![Image](https://github.com/user-attachments/assets/5db7742b-d7bd-4639-aab1-0b1852900911)

USE_MCA=1 is necessary to launch mcore job currently.

Due to Phi-3 uses a custom Python file for its model config (e.g., [configuration_phi3.py](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/tree/main)), which triggers the get_relative_imports function in the Transformers library, resulting this path error. Try this [solution](https://github.com/hiyouga/LLaMA-Factory/issues/6411#issuecomment-2557883002)...

Can you share your training scripts? I remember that we have tested this model on the `mllm_demo` dataset.

Sorry for the late reply, I have reproduced this issue. It is a common issue when using dsz3 for a moe model, for example, https://github.com/deepspeedai/DeepSpeed/issues/5066. To avoid the gradient disagreement...

Could you please share your method for feeding the fake gradients when using dsz3? BTW, we have added fake images into the pure text batch. It still gets stuck in...

I don't think it is the root cause. Can you confirm which step raises this issue?

> My dataset has samples containing 1/2 images. When training under dsz2, it gets stcuk. Training machine: 32*A100 Can you use `py-spy` to locate the issue? I can't reproduce it...