BruceYu-Bit comments

Results 20 comments of


                                            BruceYu-Bit

Warning: No ImageNet pretrain!!

the model weight is invalid , can you provide me some help?

[紧急!!!]训练 MoE 模型建议额外安装 GroupedGEMM失败

> > 按照官方文档安装GroupedGEMM，无法build。报错如下 Running setup.py clean for grouped_gemm Running command python setup.py clean /root/anaconda3/envs/xtuner/lib/python3.10/site-packages/setuptools/dist.py:759: SetuptoolsDeprecationWarning: License classifiers are deprecated. !! > > ``` > > ******************************************************************************** > > Please consider...

[紧急!!!]训练 MoE 模型建议额外安装 GroupedGEMM失败

> > > 按照官方文档安装GroupedGEMM，无法build。报错如下 Running setup.py clean for grouped_gemm Running command python setup.py clean /root/anaconda3/envs/xtuner/lib/python3.10/site-packages/setuptools/dist.py:759: SetuptoolsDeprecationWarning: License classifiers are deprecated. !! > > > ``` > > > ******************************************************************************** >...

[紧急!!!]训练 MoE 模型建议额外安装 GroupedGEMM失败

> 我们这边构建成功的cuda toolkit版本为cuda 12.8。推荐使用这个版本 [@BruceYu-Bit](https://github.com/BruceYu-Bit) > […](#) 感谢～ 12.8 可以解决

About sft efficiency for InternVL3.5

I also found this problem. Apart from this, I found the memory assumption is also bigger than internvl2.5, about double. This is annoyed. And flash attention was not supported.... @Weiyun1025...

About sft efficiency for InternVL3.5

> Thank you for your interest in our work. Our [training script](https://github.com/OpenGVLab/InternVL/blob/main/internvl_chat_gpt_oss/shell/internvl3_5_qwen3/internvl3_5_4b_sft.sh#L85) for 4B model set `use_packed_ds=True` by default, which packs multiple samples into a single sequence. In such case,...

About sft efficiency for InternVL3.5

> You can try to set `max_packed_tokens` and `num_images_expected` smaller. BTW, our codebase supports flash attention, can you share me more information about the issue you encounter when using flash...

About sft efficiency for InternVL3.5

RuntimeError: Not compiled with GPU support (dcn_v2_forward at d:\users\pycharmpath\track to detect and segment\trades-master\src\lib\model\networks\dcnv2\src\dcn_v2.h:35) (no backtrace available)

the model weight is invalid , can you provide me some help?