BruceYu-Bit

Results 20 comments of BruceYu-Bit

the model weight is invalid , can you provide me some help?

> > 按照官方文档安装GroupedGEMM,无法build。报错如下 Running setup.py clean for grouped_gemm Running command python setup.py clean /root/anaconda3/envs/xtuner/lib/python3.10/site-packages/setuptools/dist.py:759: SetuptoolsDeprecationWarning: License classifiers are deprecated. !! > > ``` > > ******************************************************************************** > > Please consider...

> > > 按照官方文档安装GroupedGEMM,无法build。报错如下 Running setup.py clean for grouped_gemm Running command python setup.py clean /root/anaconda3/envs/xtuner/lib/python3.10/site-packages/setuptools/dist.py:759: SetuptoolsDeprecationWarning: License classifiers are deprecated. !! > > > ``` > > > ******************************************************************************** >...

> 能不能看看 您 nvcc 的版本?比如运行: > > ``` > nvcc --version > ``` > > 有点怀疑是 CUDA toolkit 版本问题 nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023...

> 我们这边构建成功的cuda toolkit版本为cuda 12.8。推荐使用这个版本 [@BruceYu-Bit](https://github.com/BruceYu-Bit) > […](#) 感谢~ 12.8 可以解决

I also found this problem. Apart from this, I found the memory assumption is also bigger than internvl2.5, about double. This is annoyed. And flash attention was not supported.... @Weiyun1025...

> Thank you for your interest in our work. Our [training script](https://github.com/OpenGVLab/InternVL/blob/main/internvl_chat_gpt_oss/shell/internvl3_5_qwen3/internvl3_5_4b_sft.sh#L85) for 4B model set `use_packed_ds=True` by default, which packs multiple samples into a single sequence. In such case,...

> You can try to set `max_packed_tokens` and `num_images_expected` smaller. BTW, our codebase supports flash attention, can you share me more information about the issue you encounter when using flash...

> You can try to set `max_packed_tokens` and `num_images_expected` smaller. BTW, our codebase supports flash attention, can you share me more information about the issue you encounter when using flash...