Kingsley comments

Results 80 comments of


                                            Kingsley

使用最新master 分支训练DeepSeek V3训练，设置DeepseekV3MoE为叶子结点报错

> cc [@Kuangdd01](https://github.com/Kuangdd01) could you verify this? check it later

[Convert HF] TypeError: Received a NoneType for argument video_processor, but a BaseVideoProcessor was expected.

I guess some configurations of origin `InternVL-Chat` have not been updated with the new feature of `video_processor`. A quick hack way: ``` 1. cd your_transformers_dir 2. git checkout -b internvl_dev...

使用huggingface加载Qwen2.5-32B-Instruct 回答乱码

Hi, 请问你改动源代码下的`qwen template`了吗，在我的环境下测试输出是正常的。 ``` top_p=0.01, temperate=0.01, torch=2.6.0+cu12.4 GPU=V100*4, llamafactory=0.9.3.dev0 ``` ![Image](https://github.com/user-attachments/assets/5db7742b-d7bd-4639-aab1-0b1852900911)

llamafactory-cli train examples/megatron/qwen3_moe_full.yaml error

USE_MCA=1 is necessary to launch mcore job currently.

After training the phi3 model, export error

Due to Phi-3 uses a custom Python file for its model config (e.g., [configuration_phi3.py](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/tree/main)), which triggers the get_relative_imports function in the Transformers library, resulting this path error. Try this [solution](https://github.com/hiyouga/LLaMA-Factory/issues/6411#issuecomment-2557883002)...

在使用多图像数据微调kimi-vl时训练卡死

Can you share your training scripts? I remember that we have tested this model on the `mllm_demo` dataset.

在使用多图像数据微调kimi-vl时训练卡死

Sorry for the late reply, I have reproduced this issue. It is a common issue when using dsz3 for a moe model, for example, https://github.com/deepspeedai/DeepSpeed/issues/5066. To avoid the gradient disagreement...

在使用多图像数据微调kimi-vl时训练卡死

Could you please share your method for feeding the fake gradients when using dsz3? BTW, we have added fake images into the pure text batch. It still gets stuck in...

在使用多图像数据微调kimi-vl时训练卡死

I don't think it is the root cause. Can you confirm which step raises this issue?

在使用多图像数据微调kimi-vl时训练卡死

> My dataset has samples containing 1/2 images. When training under dsz2， it gets stcuk. Training machine: 32*A100 Can you use `py-spy` to locate the issue? I can't reproduce it...