Zhe Chen comments

Results 316 comments of


                                            Zhe Chen

如何微调InternVL-Chat-V1.2-Plus

> 哦哦好滴感谢如果只是做中文的图文对话微调该怎么设置一下呀在[第二步](https://github.com/OpenGVLab/InternVL/blob/main/internvl_chat/CONTINUED_FINETUNE.md#2-prepare-your-custom-training-data)里，准备一下中文数据的meta信息。首先在`internvl_chat/shell/data/`这个目录底下新建一个json文件，在里面写上你的中文数据集的meta信息，这里以中文数据`llava_instruct_150k_zh`为例，就是： ``` { "llava_instruct_150k_zh": { "root": "playground/data/coco/", "annotation": "playground/llava_instruct_150k_zh.jsonl", "data_augment": false, "repeat_time": 1, "length": 157712 } } ``` 如果数据量不大的话，可以选择微调lora模型，那么就使用[这个shell脚本](https://github.com/OpenGVLab/InternVL/blob/main/internvl_chat/shell/hermes2_yi34b/internvl_chat_v1_2_hermes2_yi34b_448_finetune_continue_lora.sh)。在`--model_name_or_path`填写下载的模型路径，如果要微调Plus版本，请修改为`--model_name_or_path "./pretrained/InternVL-Chat-Chinese-V1-2-Plus"`。然后在`--meta_path`这里写上刚刚新建的json文件的路径。最后用2个A100 80G GPU来训练这个模型。

如何微调InternVL-Chat-V1.2-Plus

## 数据格式我们的数据采用了JSONL（JSON Lines）格式。JSONL是一种文本格式，每行都是一个独立的JSON对象。每个JSON对象表示一个数据示例，其中包含了对话内容以及相关的元数据。 ## 数据结构每个数据示例都包含以下字段： - `"id"`: 数据示例的唯一标识符。 - `"image"`: 图像文件的路径。 - `"conversations"`: 对话内容列表，包含了交替的用户（human）和模型（gpt）对话。对话内容列表中的每个对话对象包含以下字段： - `"from"`: 对话的发起者，可以是 "human" 或 "gpt"。 - `"value"`: 对话内容。 ## 数据准备步骤为了准备数据以供使用，您可以按照以下步骤进行操作： 1....

Zhe Chen

如何微调InternVL-Chat-V1.2-Plus

如何微调InternVL-Chat-V1.2-Plus

great work. But inference is too slow

Convert to Gguf format to work with Llama.cpp?

Large 34B model OOM in evaluation

Is there a released mlp pretrained weights for internvl1.2 version

InternVL−Chat−V1.5-Int8的耗时是InternVL−Chat−V1.5的三倍吗？

Can i extract image and text feature respectively in InternVL-G model?

Can i extract image and text feature respectively in InternVL-G model?

提取营业执照“经营范围“字段（较多行数据）时，会有出现大量的重复数据