CogVLM issues

两张3090微调CogVLM的可能性？

请教下，有大佬尝试过使用两张RTX3090微调CogVLM嘛？有可能性嘛？

KDD2018

关于多轮对话

9

### System Info / 系統信息 cuda12.1 ### Who can help? / 谁可以帮助到您？ _No response_ ### Information / 问题信息 - [X] The official example scripts / 官方的示例脚本 - [X] My own...

Jackluisus

多轮对话是采用这个吗 chat_old_history_to_prompt 如果数据标签为图文对：问1，答1，问2，答2 chat_old_history_to_prompt生成prompt=问1，答1，问2，预测结果与答2计算loss ？这样的一条数据：问1，答1，问2，答2。要在网络里面训练几次？第一次：训练 prompt=问1，第二次训练prompt=问1，答1，问2 ？对于dataset.py有应该如何读取多轮对话数据标签： ![image](https://github.com/THUDM/CogVLM/assets/37207093/f679cded-0bfb-4544-8996-2984cd794b58) 使用sat格式微调，如何进行多轮对话训练。主要修改哪部分代码可以实现多轮对话训练。是要调用chat_old_history_to_prompt吗 ![image](https://github.com/THUDM/CogVLM/assets/37207093/dc6c26e6-a445-461b-a3a6-9ae78abcb84f) 应该基于哪个模型训练自有多轮对话数据集多论对话和单轮对话数据集读取方面有差异吗 ![image](https://github.com/THUDM/CogVLM/assets/37207093/e3992d69-4425-40a8-ac6f-daff0108efa4)

elesun2018

关于模型视觉定位原理

如果想用Q&A形式数据集微调，采用视觉定位的第三种形式，也就是根据边界框坐标提供描述，那么训练时模型是如何从训练数据中提取坐标（[[ ]]）信息，并与相关描述结合处理的？相关原理的代码位置在哪？

zcqzcqzcq88

如何构建CogAgent的微调数据集？

1

官方给出的数据集Captcha Images只有图片没有labels，那么如果自己想构建自己的专有数据集（包括图片与图片的描述），json文件应该如何书写呢？dataset.py文件又应该如何修改呢？感谢各位大佬给出解答！！！小弟感激不尽。

Li-mingshen

8张A800(80G)微调Cogagent依然报错： CUDA out of memory

5

### System Info / 系統信息 torch 2.0.1+cu118 torchaudio 2.0.2+cu118 torchvision 0.15.2+cu118 cuda 11.8 ### Who can help? / 谁可以帮助到您？ ![1](https://github.com/THUDM/CogVLM/assets/54988783/b4ab73e9-7396-4528-b5dc-781b930cec67) ### Information / 问题信息 - [X] The official example scripts...

GuoXu-booo

我想用同样的promt，在每次都清除上下文的情况下得到3种答案，为什么结果都是一样的

2

这是调用的代码 for i in range(3): while True: print(query) response, history, cache_image = chat( image_path, model, text_processor_infer, image_processor, query, history=history, cross_img_processor=cross_image_processor, image=cache_image, max_length=args.max_length, top_p=args.top_p, temperature=args.temperature, top_k=args.top_k, invalid_slices=text_processor_infer.invalid_slices, args=args ) print(response) history...

tygogogo

Running CogVLM and CogAgent on MPS

4

### System Info / 系統信息 Using Mac M3 Pro 18GB unified memory ### Who can help? / 谁可以帮助到您？ _No response_ ### Information / 问题信息 - [ ] The official example...

Allisterlim

CogAgent 视觉预训练模型 EVA2-CLIP-L

1

想用 CogAgent 的视觉预训练模型，看到 [https://github.com/THUDM/SwissArmyTransformer/blob/main/sat/resources/urls.py](sat) 里面只有一个名为 eva02_L_pt_m38m_p14 的模型，请问 eva02_L_pt_m38m_p14 就是 CogAgent 的视觉预训练模型吗 @1049451037

hzhiyuan

Chat with PDF documentation instead of images

### Feature request / 功能建议 Hello guys, Is there any possibility to use the model to chat with PDF documentations? These documentations contain both textual and imagery data. If yes,...

moncefarajdal

CogVLM
CogVLM copied to clipboard

Metadata

两张3090微调CogVLM的可能性？

关于多轮对话

CogVLM源代码是否支持多轮对话训练

关于模型视觉定位原理

如何构建CogAgent的微调数据集？

8张A800(80G)微调Cogagent依然报错： CUDA out of memory

我想用同样的promt，在每次都清除上下文的情况下得到3种答案，为什么结果都是一样的

Running CogVLM and CogAgent on MPS

CogAgent 视觉预训练模型 EVA2-CLIP-L

Chat with PDF documentation instead of images

← Metadata

Owner

Metadata

CogVLM CogVLM copied to clipboard

Metadata

← Metadata

Owner

Metadata

CogVLM
CogVLM copied to clipboard