FlyCarrot issues

Results 4 issues of


                                            FlyCarrot

deepseek v2 使用shard模式做训练，在load权重的部分报missing w1w3这类key的情况

报错如下模型是deepseek v2-lite ，shard 是8， ``` model.layers.7.mlp.experts.3.w1w3 not in state_dict, loading deepseek-ai/DeepSeek-V2-Lite/model-00002-of-000004.safetensors ```

deepseek v2 lite 模型 convert 时 print_on_rank0() 报错

如题，moe模型 convert的时候有 xtuner/xtuner/utils/handle_moe_load_and_save.py 参与其中的 print_on_rank0 函数有 ```python def print_on_rank0(info): if dist.get_rank() == 0: print_log(info, 'current') ``` 涉及多卡初始化，但是convert的时候实际没有多卡初始化，因此会报错。建议修改一下代码，区分一下convert时直接print而不是判断rank

Could you explain "select sentences according to the logits changed"?

Hi, it's a good code repo, but I don't find the code for calculating logits changed, Could you point to the target code line? Thanks a lot!

[Bug]: Can't use yarn rope config for long context in Qwen2 model

### Your current environment The output of `python collect_env.py` ```text Collecting environment information... PyTorch version: 2.4.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build...

bug

FlyCarrot

deepseek v2 使用shard模式做训练，在load权重的部分 报missing w1w3这类key的情况

deepseek v2 lite 模型 convert 时 print_on_rank0() 报错

Could you explain "select sentences according to the logits changed"?

[Bug]: Can't use yarn rope config for long context in Qwen2 model

deepseek v2 使用shard模式做训练，在load权重的部分报missing w1w3这类key的情况