InternVL
InternVL copied to clipboard
Using Swift to perform inference and fine-tune InternVL-Chat-V1.5
Thanks for your awesome work.
swift now supports inference, training of InternVL-Chat-V1.5 model
For more information, please refer to our document
For more questions, please raise an issue in the swift repository.
That's great, thank you very much!
Thanks for your awesome work.
swift now supports inference, training of InternVL-Chat-V1.5 model
For more information, please refer to our document
请问按您教程里的来做,最后会报错没有flash_attn,使用的是V100,有解决方案吗?
[INFO:swift] InternVLChatModel: 25514.1861M Params (25514.1861M Trainable [100.0000%]), 402.6563M Buffers. [INFO:swift] system: You are an AI assistant whose name is InternLM (书生·浦语). [INFO:swift] Input
exitorquitto exit the conversation. [INFO:swift] Inputmulti-lineto switch to multi-line input mode. [INFO:swift] Inputreset-systemto reset the system and clear the history. [INFO:swift] Inputclearto clear the history. [INFO:swift] Please enter the conversation content first, followed by the path to the multimedia file. <<< 描述一下这张图片的内容 Input a media path or URL <<< https://img2.baidu.com/it/u=2085854734,3872819026&fm=253&fmt=auto&app=138&f=JPEG?w=762&h=500 Exception in thread Thread-2: Traceback (most recent call last): File "/home/sunyuan/.cache/huggingface/modules/transformers_modules/InternVL-Chat-V1-5/modeling_internlm2.py", line 57, in _import_flash_attn from flash_attn import flash_attn_func as _flash_attn_func ModuleNotFoundError: No module named 'flash_attn'
请问按您教程里的来做,最后会报错没有flash_attn,使用的是V100,有解决方案吗?
解决方案:将model.config.attn_implementation设为eager,拉下最新代码,用use_flash_attn false
请问按您教程里的来做,最后会报错没有flash_attn,使用的是V100,有解决方案吗?
解决方案:将model.config.attn_implementation设为
eager,拉下最新代码,用use_flash_attn false
请问全参微调internVL-V1-5用您这个框架需要多大的显存
请问按您教程里的来做,最后会报错没有flash_attn,使用的是V100,有解决方案吗?
解决方案:将model.config.attn_implementation设为
eager,拉下最新代码,用use_flash_attn false
我看readme里面
写的是只用4*72G就可以全参,请问这是怎么做到的,这个不是26B得参数吗
请问全参微调internVL-V1-5用您这个框架需要多大的显存
你好,4*72G是一个参考值,训练显存会根据训练数据的图片大小动态变化,这是我刚才全参训练的一个占用情况
请问全参微调internVL-V1-5用您这个框架需要多大的显存
你好,4*72G是一个参考值,训练显存会根据训练数据的图片大小动态变化,这是我刚才全参训练的一个占用情况
很感谢您,请问您的意思是微调的照片分辨率会影响训练的显存吗?请问您这是多大的图片
请问全参微调internVL-V1-5用您这个框架需要多大的显存
你好,4*72G是一个参考值,训练显存会根据训练数据的图片大小动态变化,这是我刚才全参训练的一个占用情况
我还是有点不理解,这里的全参数是不是20B的LLM没有参与训练,只有VIT和连接器参与微调了?
很感谢您,请问您的意思是微调的照片分辨率会影响训练的显存吗?请问您这是多大的图片
照片大小会影响ViT输出大小进而影响模型的输入序列长度,数据集用的https://www.modelscope.cn/datasets/modelscope/coco_2014_caption/summary
这里的全参数是不是20B的LLM没有参与训练,只有VIT和连接器参与微调了?
全参包括LLM部分,这个是log中训练参数量的信息:[INFO:swift] InternVLChatModel: 25514.1861M Params (25514.1861M Trainable [100.0000%]), 402.6563M Buffers.
@hjh0119 想问下 按照您的教程里,我想对int8版本进行微调,替换掉了--model_type internvl-chat-v1_5-int8 模型名称,加载了从huggingface下载的模型权重,但报了如下的错误,似乎和int8有些关系,对与1.5的版本则不会报错,可以正常微调,想问下这是需要加什么特殊的参数影响么?
/data2/renyw/InstallationPackage/anaconda3/envs/swift/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py:316: UserWarning: MatMul8bitLt: inputs will be cast from torch.bfloat16 to float16 during quantization
warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
/data2/renyw/InstallationPackage/anaconda3/envs/swift/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py:316: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization
warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
Traceback (most recent call last):
File "/data2/renyw/PythonWorkspace/FM-LLM/swift/swift/cli/sft.py", line 5, in <module>
sft_main()
File "/data2/renyw/PythonWorkspace/FM-LLM/swift/swift/utils/run_utils.py", line 27, in x_main
result = llm_x(args, **kwargs)
File "/data2/renyw/PythonWorkspace/FM-LLM/swift/swift/llm/sft.py", line 256, in llm_sft
trainer.train(training_args.resume_from_checkpoint)
File "/data2/renyw/PythonWorkspace/FM-LLM/swift/swift/trainers/trainers.py", line 50, in train
res = super().train(*args, **kwargs)
File "/data2/renyw/InstallationPackage/anaconda3/envs/swift/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
return inner_training_loop(
File "/data2/renyw/InstallationPackage/anaconda3/envs/swift/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/data2/renyw/InstallationPackage/anaconda3/envs/swift/lib/python3.10/site-packages/transformers/trainer.py", line 3147, in training_step
self.accelerator.backward(loss)
File "/data2/renyw/InstallationPackage/anaconda3/envs/swift/lib/python3.10/site-packages/accelerate/accelerator.py", line 2125, in backward
loss.backward(**kwargs)
File "/data2/renyw/InstallationPackage/anaconda3/envs/swift/lib/python3.10/site-packages/torch/_tensor.py", line 492, in backward
torch.autograd.backward(
File "/data2/renyw/InstallationPackage/anaconda3/envs/swift/lib/python3.10/site-packages/torch/autograd/__init__.py", line 251, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/data2/renyw/InstallationPackage/anaconda3/envs/swift/lib/python3.10/site-packages/torch/autograd/function.py", line 288, in apply
return user_fn(self, *args)
File "/data2/renyw/InstallationPackage/anaconda3/envs/swift/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 479, in backward
.mul_(state.SCB.unsqueeze(1).mul(1.0 / 127.0))
RuntimeError: The size of tensor a (92576) must match the size of tensor b (92553) at non-singleton dimension 0
Train: 0%| | 0/2531 [00:02<?, ?it/s]
@MVP-D77 不需要替换model_type, 使用本地模型文件在--model_id_or_path指定路径, 再指定--model_type internvl-chat-v1_5-int8.
@MVP-D77 不需要替换model_type, 使用本地模型文件在--model_id_or_path指定路径, 再指定--model_type internvl-chat-v1_5-int8.
@hjh0119 您好,这是我的微调命令,请您看一下有没有问题,它还是报错,和上面报错信息一样,同样的数据集,internvl-1.5就不会报错,另外想问下,flash_attn如果想开启的话,是加一个设置为true么,我看打印的log文件里面显示use_flash_attn 为null
CUDA_VISIBLE_DEVICES=0,1 swift sft --model_type internvl-chat-v1_5-int8 --dataset coco-mini-en-2 --model_id_or_path xxxxxxxxx/InternVL/pretrained/InternVL-Chat-V1-5-Int8
@hjh0119 您好,这是我的微调命令,请您看一下有没有问题,它还是报错,和上面报错信息一样,同样的数据集,internvl-1.5就不会报错,另外想问下,flash_attn如果想开启的话,是加一个设置为true么,我看打印的log文件里面显示use_flash_attn 为null
可以复现,修复中 开启flash-attention: --use_flash_attn true 关于swift的问题欢迎到swift提issue
@hjh0119 您好,这是我的微调命令,请您看一下有没有问题,它还是报错,和上面报错信息一样,同样的数据集,internvl-1.5就不会报错,另外想问下,flash_attn如果想开启的话,是加一个设置为true么,我看打印的log文件里面显示use_flash_attn 为null
可以复现,修复中 开启flash-attention: --use_flash_attn true 关于swift的问题欢迎到swift提issue
@hjh0119 谢谢您的回复,也感谢您的工作,我还有两个新的有关internvl微调的问题,已在swift仓库提出issue https://github.com/modelscope/swift/issues/925 非常期待您的回复
很感谢您,请问您的意思是微调的照片分辨率会影响训练的显存吗?请问您这是多大的图片
照片大小会影响ViT输出大小进而影响模型的输入序列长度,数据集用的https://www.modelscope.cn/datasets/modelscope/coco_2014_caption/summary
这里的全参数是不是20B的LLM没有参与训练,只有VIT和连接器参与微调了?
全参包括LLM部分,这个是log中训练参数量的信息:[INFO:swift] InternVLChatModel: 25514.1861M Params (25514.1861M Trainable [100.0000%]), 402.6563M Buffers.
您好,谢谢您的回复,请问如果我想把微调时的max_length扩宽应该怎么做?我现在发现只要我不用默认的2048(比如改成4096)就会报错
"
RuntimeError: CUDA error: unspecified launch failure
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
"
你好,请问你们的脚本支持多图嘛?我看输入格式是这样的。如果不支持的话,大概的修改思路能否帮忙指点一下呢
(只支持单轮对话, 每轮对话必须包含一张图片, 支持传入本地路径或URL) {"query": "55555", "response": "66666", "images": ["image_path"]} {"query": "eeeee", "response": "fffff", "images": ["image_path"]} {"query": "EEEEE", "response": "FFFFF", "images": ["image_path"]}
@1028686314 多图的路径在images里用逗号隔开
你好 我想请问下 如果我想输入尽量多的图片进行微调和推理 需要修改哪些设置呀?是修改max_length参数就够了嘛。如果修改这个参数做sft,会不会影响模型效果呀? 我现在测试下来最多输入8张图左右,现在想利用视频抽帧数据做sft,8张图有点不太能满足需求。
请问我用的是官方文件来进行lora微调的,想用swift来merge的时候发现merge后的权重和原权重一摸一样,lora权重并没有被merge进去,是什么原因?是一定也要用swift来微调才行吗?
请问我用的是官方文件来进行lora微调的,想用swift来merge的时候发现merge后的权重和原权重一摸一样,lora权重并没有被merge进去,是什么原因?是一定也要用swift来微调才行吗?
有一个提示是sft_args.json not found
请问我用的是官方文件来进行lora微调的,想用swift来merge的时候发现merge后的权重和原权重一摸一样,lora权重并没有被merge进去,是什么原因?是一定也要用swift来微调才行吗?
有一个提示是sft_args.json not found
这个文件是swift微调之后生成的文件 你没有的话看起来是不行
请问我用的是官方文件来进行lora微调的,想用swift来merge的时候发现merge后的权重和原权重一摸一样,lora权重并没有被merge进去,是什么原因?是一定也要用swift来微调才行吗?
有一个提示是sft_args.json not found
这个文件是swift微调之后生成的文件 你没有的话看起来是不行
谢谢!
请问我用的是官方文件来进行lora微调的,想用swift来merge的时候发现merge后的权重和原权重一摸一样,lora权重并没有被merge进去,是什么原因?是一定也要用swift来微调才行吗?
有一个提示是sft_args.json not found
这个文件是swift微调之后生成的文件 你没有的话看起来是不行
可以麻烦看一下你的sft_args.json文件吗,我试试改一下配置merge试试。因为我看swift是不是不支持图文对话的微调?我如果要改我本来数据集稍微有点麻烦,谢谢!
{ "model_type": "internvl-chat-v1_5", "model_id_or_path": "/mnt/workspace/workgroup/shiyi/weights/InternVL-Chat-V1-5", "model_revision": "master", "sft_type": "lora", "freeze_parameters": 0.0, "additional_trainable_parameters": [], "tuner_backend": "peft", "template_type": "internvl", "output_dir": "/mnt/workspace/workgroup/swift/output/internvl-chat-v1_5/v0-20240605-173708", "add_output_dir_suffix": true, "ddp_backend": null, "ddp_find_unused_parameters": null, "ddp_broadcast_buffers": null, "seed": 42, "resume_from_checkpoint": null, "ignore_data_skip": false, "dtype": "bf16", "packing": false, "dataset": [ "/mnt/workspace/workgroup/swift/data/test.jsonl" ], "val_dataset": [], "dataset_seed": 42, "dataset_test_ratio": 0.01, "use_loss_scale": false, "system": "You are an AI assistant whose name is InternLM (书生·浦语).", "max_length": 2048, "truncation_strategy": "delete", "check_dataset_strategy": "none", "model_name": [ null, null ], "model_author": [ null, null ], "quant_method": null, "quantization_bit": 0, "hqq_axis": 0, "hqq_dynamic_config_path": null, "bnb_4bit_comp_dtype": "bf16", "bnb_4bit_quant_type": "nf4", "bnb_4bit_use_double_quant": true, "bnb_4bit_quant_storage": null, "lora_target_modules": [ "wqkv" ], "lora_rank": 8, "lora_alpha": 32, "lora_dropout_p": 0.05, "lora_bias_trainable": "none", "lora_modules_to_save": [], "lora_dtype": null, "lora_lr_ratio": null, "use_rslora": false, "use_dora": false, "init_lora_weights": true, "boft_block_size": 4, "boft_block_num": 0, "boft_n_butterfly_factor": 1, "boft_target_modules": [ "DEFAULT" ], "boft_dropout": 0.0, "boft_modules_to_save": [], "vera_rank": 256, "vera_target_modules": [ "DEFAULT" ], "vera_projection_prng_key": 0, "vera_dropout": 0.0, "vera_d_initial": 0.1, "vera_modules_to_save": [], "adapter_act": "gelu", "adapter_length": 128, "use_galore": false, "galore_rank": 128, "galore_target_modules": null, "galore_update_proj_gap": 50, "galore_scale": 1.0, "galore_proj_type": "std", "galore_optim_per_parameter": false, "galore_with_embedding": false, "adalora_target_r": 8, "adalora_init_r": 12, "adalora_tinit": 0, "adalora_tfinal": 0, "adalora_deltaT": 1, "adalora_beta1": 0.85, "adalora_beta2": 0.85, "adalora_orth_reg_weight": 0.5, "ia3_target_modules": [ "DEFAULT" ], "ia3_feedforward_modules": [], "ia3_modules_to_save": [], "llamapro_num_new_blocks": 4, "llamapro_num_groups": null, "neftune_noise_alpha": null, "neftune_backend": "transformers", "lisa_activated_layers": 0, "lisa_step_interval": 20, "gradient_checkpointing": true, "deepspeed": null, "batch_size": 1, "eval_batch_size": 1, "num_train_epochs": 1, "max_steps": -1, "optim": "adamw_torch", "adam_beta1": 0.9, "adam_beta2": 0.999, "learning_rate": 0.0001, "weight_decay": 0.1, "gradient_accumulation_steps": 16, "max_grad_norm": 0.5, "predict_with_generate": false, "lr_scheduler_type": "linear", "warmup_ratio": 0.05, "eval_steps": 50, "save_steps": 50, "save_only_model": false, "save_total_limit": 2, "logging_steps": 5, "dataloader_num_workers": 0, "dataloader_pin_memory": false, "dataloader_drop_last": false, "push_to_hub": false, "hub_model_id": null, "hub_token": null, "hub_private_repo": false, "push_hub_strategy": "push_best", "test_oom_error": false, "disable_tqdm": false, "lazy_tokenize": true, "preprocess_num_proc": 1, "use_flash_attn": null, "ignore_args_error": false, "check_model_is_latest": true, "logging_dir": "/mnt/workspace/workgroup/swift/output/internvl-chat-v1_5/v0-20240605-173708/runs", "report_to": [ "tensorboard" ], "acc_strategy": "token", "save_on_each_node": true, "evaluation_strategy": "steps", "save_strategy": "steps", "save_safetensors": true, "gpu_memory_fraction": null, "include_num_input_tokens_seen": false, "local_repo_path": null, "custom_register_path": null, "custom_dataset_info": null, "device_map_config_path": null, "max_new_tokens": 2048, "do_sample": true, "temperature": 0.3, "top_k": 20, "top_p": 0.7, "repetition_penalty": 1.0, "num_beams": 1, "fsdp": "", "fsdp_config": null, "sequence_parallel_size": 1, "model_layer_cls_name": null, "metric_warmup_step": 0, "fsdp_num": 1, "per_device_train_batch_size": null, "per_device_eval_batch_size": null, "self_cognition_sample": 0, "train_dataset_mix_ratio": 0.0, "train_dataset_mix_ds": [ "ms-bench" ], "train_dataset_sample": -1, "val_dataset_sample": null, "safe_serialization": null, "only_save_model": null, "neftune_alpha": null, "deepspeed_config_path": null, "model_cache_dir": null, "custom_train_dataset_path": [ "/mnt/workspace/workgroup/swift/data/test.jsonl" ], "custom_val_dataset_path": [], "use_self_cognition": false, "lora_use_embedding": false, "lora_use_all": false, "lora_m2s_use_embedding": false, "lora_m2s_use_ln": false, "torch_dtype": "torch.bfloat16", "fp16": false, "bf16": true, "bnb_4bit_compute_dtype": "torch.bfloat16", "load_in_4bit": false, "load_in_8bit": false, "train_sampler_random": true, "training_args": "Seq2SeqTrainingArguments(output_dir='/mnt/workspace/workgroup/swift/output/internvl-chat-v1_5/v0-20240605-173708', overwrite_output_dir=False, do_train=False, do_eval=True, do_predict=False, eval_strategy=<IntervalStrategy.STEPS: 'steps'>, prediction_loss_only=False, per_device_train_batch_size=1, per_device_eval_batch_size=1, per_gpu_train_batch_size=None, per_gpu_eval_batch_size=None, gradient_accumulation_steps=16, eval_accumulation_steps=None, eval_delay=0, learning_rate=0.0001, weight_decay=0.1, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=0.5, num_train_epochs=1, max_steps=-1, lr_scheduler_type=<SchedulerType.LINEAR: 'linear'>, lr_scheduler_kwargs={}, warmup_ratio=0.05, warmup_steps=0, log_level='passive', log_level_replica='warning', log_on_each_node=True, logging_dir='/mnt/workspace/workgroup/swift/output/internvl-chat-v1_5/v0-20240605-173708/runs', logging_strategy=<IntervalStrategy.STEPS: 'steps'>, logging_first_step=True, logging_steps=5, logging_nan_inf_filter=True, save_strategy=<IntervalStrategy.STEPS: 'steps'>, save_steps=50, save_total_limit=2, save_safetensors=True, save_on_each_node=True, save_only_model=False, restore_callback_states_from_checkpoint=False, no_cuda=False, use_cpu=False, use_mps_device=False, seed=42, data_seed=None, jit_mode_eval=False, use_ipex=False, bf16=True, fp16=False, fp16_opt_level='O1', half_precision_backend='auto', bf16_full_eval=False, fp16_full_eval=False, tf32=None, local_rank=0, ddp_backend=None, tpu_num_cores=None, tpu_metrics_debug=False, debug=[], dataloader_drop_last=False, eval_steps=50, dataloader_num_workers=0, dataloader_prefetch_factor=None, past_index=-1, run_name='/mnt/workspace/workgroup/swift/output/internvl-chat-v1_5/v0-20240605-173708', disable_tqdm=False, remove_unused_columns=False, label_names=None, load_best_model_at_end=False, metric_for_best_model='loss', greater_is_better=False, ignore_data_skip=False, fsdp=[], fsdp_min_num_params=0, fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_transformer_layer_cls_to_wrap=None, accelerator_config=AcceleratorConfig(split_batches=False, dispatch_batches=None, even_batches=True, use_seedable_sampler=True, non_blocking=False, gradient_accumulation_kwargs=None), deepspeed=None, label_smoothing_factor=0.0, optim=<OptimizerNames.ADAMW_TORCH: 'adamw_torch'>, optim_args=None, adafactor=False, group_by_length=False, length_column_name='length', report_to=['tensorboard'], ddp_find_unused_parameters=None, ddp_bucket_cap_mb=None, ddp_broadcast_buffers=None, dataloader_pin_memory=False, dataloader_persistent_workers=False, skip_memory_metrics=True, use_legacy_prediction_loop=False, push_to_hub=False, resume_from_checkpoint=None, hub_model_id=None, hub_strategy=<HubStrategy.EVERY_SAVE: 'every_save'>, hub_token=None, hub_private_repo=False, hub_always_push=False, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, include_inputs_for_metrics=False, eval_do_concat_batches=True, fp16_backend='auto', evaluation_strategy='steps', push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=None, mp_parameters='', auto_find_batch_size=False, full_determinism=False, torchdynamo=None, ray_scope='last', ddp_timeout=1800, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, dispatch_batches=None, split_batches=None, include_tokens_per_second=False, include_num_input_tokens_seen=False, neftune_noise_alpha=None, optim_target_modules=None, batch_eval_metrics=False, sortish_sampler=True, predict_with_generate=False, generation_max_length=None, generation_num_beams=None, generation_config=GenerationConfig {\n "do_sample": true,\n "eos_token_id": 2,\n "max_new_tokens": 2048,\n "pad_token_id": 2,\n "temperature": 0.3,\n "top_k": 20,\n "top_p": 0.7\n}\n, train_sampler_random=True, push_hub_strategy='push_best', acc_strategy='token', additional_saved_files=[], metric_warmup_step=0, train_dataset_sample=4)" }
请问下大家,Mini-InternVL-1.5在swift下微调支持这种图文交织的格式么,类似这种支持多回合对话,每回合可包含多图或无图,支持传入本地路径或URL
[
{"conversations": [
{"from": "user", "value": "Picture 1:img_path\n11111"},
{"from": "assistant", "value": "22222"}
]},
{"conversations": [
{"from": "user", "value": "Picture 1:
img_path\nPicture 2:
img_path2\nPicture 3:
img_path3\naaaaa"},
{"from": "assistant", "value": "bbbbb"},
{"from": "user", "value": "Picture 1:
img_path\nccccc"},
{"from": "assistant", "value": "ddddd"}
]},
{"conversations": [
{"from": "user", "value": "AAAAA"},
{"from": "assistant", "value": "BBBBB"},
{"from": "user", "value": "CCCCC"},
{"from": "assistant", "value": "DDDDD"}
]}
]
你好,请问你们的脚本支持多图嘛?我看输入格式是这样的。如果不支持的话,大概的修改思路能否帮忙指点一下呢
(只支持单轮对话, 每轮对话必须包含一张图片, 支持传入本地路径或URL) {"query": "55555", "response": "66666", "images": ["image_path"]} {"query": "eeeee", "response": "fffff", "images": ["image_path"]} {"query": "EEEEE", "response": "FFFFF", "images": ["image_path"]}
@1028686314 多图的路径在images里用逗号隔开
请问下大佬,Mini-InternVL-1.5在swift下微调支持这种图文交织的格式么,类似这种支持多回合对话,每回合可包含多图或无图,支持传入本地路径或URL [ {"conversations": [ {"from": "user", "value": "Picture 1:img_path\n11111"}, {"from": "assistant", "value": "22222"} ]}, {"conversations": [ {"from": "user", "value": "Picture 1:img_path\nPicture 2:img_path2\nPicture 3:img_path3\naaaaa"}, {"from": "assistant", "value": "bbbbb"}, {"from": "user", "value": "Picture 1:img_path\nccccc"}, {"from": "assistant", "value": "ddddd"} ]}, {"conversations": [ {"from": "user", "value": "AAAAA"}, {"from": "assistant", "value": "BBBBB"}, {"from": "user", "value": "CCCCC"}, {"from": "assistant", "value": "DDDDD"} ]} ]
@1028686314 多图的路径在images里用逗号隔开
@1028686314 多图的路径在images里用逗号隔开
您好,使用swift按照逗号隔开来微调多图任务,推理的时候我可以按照intern-vl的多图推理方式进行推理么 ,就是图中的torch.cat方式
感谢关注,具体请参考swift实现。
Can you provide example code to work with internVL2 Model on TPUs XLA @hjh0119
