InternVL icon indicating copy to clipboard operation
InternVL copied to clipboard

internVL question disscuss

Open xiaohangguo opened this issue 1 year ago • 2 comments

Checklist

  • [x] 1. I have searched related issues but cannot get the expected help.
  • [X] 2. The bug has not been fixed in the latest version.
  • [X] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

不是想报bug,想请教一下InternVL论文中的第一阶段训练的问题 1.llama是 decoder-only 架构,这里第一阶段,img被internvit编码以后,是和 llama tokenizer的向量算的pairwise loss 吗?然后后续这个LLAMA是会直接被丢掉吧?现在QLLAMA 2.这部分数据应该就是和clip类似的图像文本对吧?如果想继续做SFT,现在有代码支持吗,我阅读了一下仓库代码,好像没找到,请问我需要怎么做? 96ae94019bc0a7fe7eda067a8bbf257

Reproduction

None

Environment

None

Error traceback

None

xiaohangguo avatar Aug 19 '24 03:08 xiaohangguo

对于1.的补充:text部分是相当于做了分词之后,然后丢给llama拿到隐藏状态然后作为词嵌入对吗?

xiaohangguo avatar Aug 19 '24 05:08 xiaohangguo

你好,

  1. QLLaMA继承了第一阶段 LLaMA-7B的权重. 具体第二阶段的数据可以参考论文https://arxiv.org/pdf/2312.14238;
  2. 继续做sft请参考https://internvl.readthedocs.io/en/latest/internvl2.0/finetune.html

G-z-w avatar Aug 28 '24 04:08 G-z-w