InternVL internVL question disscuss

Checklist

[x] 1. I have searched related issues but cannot get the expected help.
[X] 2. The bug has not been fixed in the latest version.
[X] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

不是想报bug，想请教一下InternVL论文中的第一阶段训练的问题 1.llama是 decoder-only 架构，这里第一阶段，img被internvit编码以后，是和 llama tokenizer的向量算的pairwise loss 吗？然后后续这个LLAMA是会直接被丢掉吧？现在QLLAMA 2.这部分数据应该就是和clip类似的图像文本对吧？如果想继续做SFT，现在有代码支持吗，我阅读了一下仓库代码，好像没找到，请问我需要怎么做？ 96ae94019bc0a7fe7eda067a8bbf257

Reproduction

None

Environment

None

Error traceback

None

Aug 19 '24 03:08 xiaohangguo

对于1.的补充：text部分是相当于做了分词之后，然后丢给llama拿到隐藏状态然后作为词嵌入对吗？

Aug 19 '24 05:08 xiaohangguo

你好，

QLLaMA继承了第一阶段 LLaMA-7B的权重. 具体第二阶段的数据可以参考论文https://arxiv.org/pdf/2312.14238；
继续做sft请参考https://internvl.readthedocs.io/en/latest/internvl2.0/finetune.html

Aug 28 '24 04:08 G-z-w