DeepSpeedExamples
DeepSpeedExamples copied to clipboard
Inquiry about Embedding Concatenation in DeepSpeed-VisualChat
@yaozhewei First, I'd like to extend my gratitude for the incredible work you've been doing with DeepSpeedExamples. It's truly commendable and has been a great resource for the community.
As I was exploring the code, particularly the section where language and visual embeddings are concatenated, I came across something that prompted a question. I noticed in this line of code:
https://github.com/microsoft/DeepSpeedExamples/blob/60e412eaa7275212e240f31055fc8b814ebe653f/applications/DeepSpeed-VisualChat/utils/model/modeling_dsvl.py#L226
that img_pos_list is reversed during the insertion of the image embedding. However, it appears that cur_img is not reversed in the process. Could there potentially be a mismatch between the visual and text information due to this? I'm curious if this is intentional for alignment purposes or if it might be an oversight.
I would appreciate any clarification you can provide on this matter.
我制作了一个13w的图文对训练集 vis_encoder = 'clip-vit-large-patch14' lang_encoder 为我以前微调过的语言模型 --7B Chinesellama, 训练参数 lr =1e-3,epoch=6,warmup=200
第一次采用原代码没有做任何修改,训练6epochs后,模型到3-4epoch就几乎降不下去了,loss稳定在2.1左右, 6个epochs后最终loss~1.95,eval_loss 2.2, 已经过拟合了 实际中eval-loss最好也只在2.1左右 模型实际验证,效果非常差。
第二次正在训练,将上面那行代码改成如下: for img_i, img_pos in zip(cur_img, img_pos_list): 按顺序做拼接, 目前epoch=2.3, loss~1.98, eval-loss=2.16 个人感觉,此次收敛速度快了些,但最终结果不好说
等我后续的验证结果
Removing it directly doesn't seem quite right. I recommend keeping it and applying the flip operation to both 'cur_img' and 'img_pos_list'.
Hi both, Sorry for the late reply. You two are likely right, we should apply flip operator for both lists. The reason why we need to do reverse insertions is that: if the original order is used, then the inserted position of the second (and any later) img will change due to the insertions of the first image.
I left the DeepSpeed team and now I do not have any gpu access to validate this. @jeffra @tjruwase could you find someone in the team to help verify this?
Removing it directly doesn't seem quite right. I recommend keeping it and applying the flip operation to both 'cur_img' and 'img_pos_list'.
You are right. I got a better model than last time.