sunzx8 comments

Results 22 comments of


                                            sunzx8

微调internvl-v1.5报错KeyError: 'input_ids'

还有我想问一下如果需要16卡两台机器一起微调需要怎么设置？

微调internvl-v1.5报错KeyError: 'input_ids'

> CUDA报错，可能是OOM或者CUDA环境问题 > > 多机多卡readme里有样例还有个问题，我发现用您给的lora微调方式虽然param显示只训练了很少的参数，但是显存消耗和全参数一模一样，请问这是不是实际没有转换过来？实际消耗显存和全参数微调coco-mini的一样是241gb

微调internvl-v1.5报错KeyError: 'input_ids'

> > CUDA报错，可能是OOM或者CUDA环境问题 > > 多机多卡readme里有样例 > 还有个问题，我发现用您给的lora微调方式虽然param显示只训练了很少的参数，但是显存消耗和全参数一模一样，请问这是不是实际没有转换过来？实际消耗显存和全参数微调coco-mini的一样是241gb CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 swift sft --model_type internvl-chat-v1_5 --model_id_or_path /dev/shm/shawn/hf_ms_model/InternVL-Chat-V1-5 --dataset coco-mini-en-2 --sft_type lora

Question: Support for sparse embeddings?

> Hi, I was wondering whether is would make sence to support models which, in addition to dense vectors, also support sparse and colbert. For example, [BGE-M3](https://huggingface.co/BAAI/bge-m3) works well under...

Question: Support for sparse embeddings?

> @sunzx8 Why are you using threading.Lock? This is harmful for performance & the opposite of how its meant to be. > > Please call multiple of these from multiple...

1.5最大窗口长度只有2048吗？可不可以设置的更长比如4096

请问如果想把训练的长度扩的更大应该怎么办？比如我想扩到8192，应该从预训练开始重新做吗？

1.5最大窗口长度只有2048吗？可不可以设置的更长比如4096

> 我觉得不需要重头预训练，4k训练的模型直接扩大到8k-10k没有大问题，如果想扩大到更大的长度，可能需要再用长数据做一下微调。 > > 另外您可以试试我们最近发布的[Mini-InternVL-Chat-2B-V1-5](https://huggingface.co/OpenGVLab/Mini-InternVL-Chat-2B-V1-5)和[Mini-InternVL-Chat-4B-V1-5](https://huggingface.co/OpenGVLab/Mini-InternVL-Chat-4B-V1-5)，这两个模型都是在8k长度下做的SFT。谢谢，请问长度为4096做sft大约需要多少资源？不配置slurm集群可以用16*48G卡来做吗

1.5最大窗口长度只有2048吗？可不可以设置的更长比如4096

> 我觉得不需要重头预训练，4k训练的模型直接扩大到8k-10k没有大问题，如果想扩大到更大的长度，可能需要再用长数据做一下微调。 > > 另外您可以试试我们最近发布的[Mini-InternVL-Chat-2B-V1-5](https://huggingface.co/OpenGVLab/Mini-InternVL-Chat-2B-V1-5)和[Mini-InternVL-Chat-4B-V1-5](https://huggingface.co/OpenGVLab/Mini-InternVL-Chat-4B-V1-5)，这两个模型都是在8k长度下做的SFT。简单测试了一下4B的这是原图片 ![3577300715](https://github.com/OpenGVLab/InternVL/assets/136688632/3288da3e-aa42-4432-8858-50793becd7e4)

使用ch_ptocr_v4__rec_server_infer.pth模型推理的时候报错，加载的模型结构和模型参数匹配不上。

> 可以对比下，跟命令行运行，配置差在哪里。初步看没看出有问题我也遇到了这个问题，我运行的指令为 python -m tools.infer.predict_system --image_dir /home/ubuntu/shawn/test_imgs/sda.jpg --det_model_path /home/ubuntu/shawn/paadletorch/PaddleOCR2Pytorch/ch_ptocr_v4_det_infer.pth --det_yaml_path ./configs/det/ch_PP-OCRv4/ch_PP-OCRv4_det_teacher.yml --rec_image_shape 3,48,320 --rec_model_path /home/ubuntu/shawn/paadletorch/PaddleOCR2Pytorch/ch_ptocr_v4_rec_server_infer.pth --rec_yaml_path ./configs/rec/PP-OCRv4/ch_PP-OCRv4_rec.yml 请问哪里有问题吗

28h20训练7b模型特别慢，且bsz=2就会oom

> 这么长的输出.......... 不过可以试试 ring attn + hybrid engine 另外请参考 [#821](https://github.com/OpenRLHF/OpenRLHF/issues/821) 请问ring attn + hybrid engine怎么打开？谢谢