yepzhang
yepzhang
而且我在训练的时候发现在加载数据集的时候,他会把图片或者视频一起加载上去,这导致内存消耗变得巨大,不知道作者能不能改一下逻辑,改成原本sft的那种逻辑呢
> 每台机器的resume_from_checkpoint路径是对应机器的checkpoint 我在单机多卡训练时也遇到了同样的问题: RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM: Missing key(s) in state_dict: "base_model.model.vision_model.embeddings.class_embedding", "base_model.model.vision_model.embeddings.position_embedding", "base_model.model.vision_model.embeddings.patch_embedding.weight", "base_model.model.vision_model.embeddings.patch_embedding.bias", "base_model.model.vision_model.encoder.layers.0.ls1", "base_model.model.vision_model.encoder.layers.0.ls2", "base_model.model.vision_model.encoder.layers.0.attn.qkv.weight", "base_model.model.vision_model.encoder.lx x x] 我想恢复训练lora,请问有什么办法吗
> 先确认一下是最新的版本 2.2.3 或者main分支 > > 然后方便给一下复现的办法吗 这是我的bash脚本,我是先在另外一个dataset上面训练,然后在新的dataset上面resume,两个dataset的key是一致的 NPROC_PER_NODE=4 \ CUDA_VISIBLE_DEVICES=0,1,2,3 swift sft \ --model_type internvl2-8b \ --model_id_or_path /InternVL-8B \ --per_device_train_batch_size 4 \ --gradient_accumulation_step 2 \ --output_dir VTT/trec16_22 \ --dataset...
非常感谢~请问如果要继续做finetune的话是参考videochat2的脚本实现吗?谢谢!
另外你们有计划发布16frame的版本吗
> Hi! We released UMT-L since it runs faster. Hello! Will you release internvideo2 version in there future? thanks~
Hi! I noticed that you’re working with LongViLa-LLama3-1024Frames. I’m also trying to run inference with long context but am encountering issues with multi-GPU usage—my model only runs on a single...