InternVL
InternVL copied to clipboard
internVL question disscuss
Checklist
- [x] 1. I have searched related issues but cannot get the expected help.
- [X] 2. The bug has not been fixed in the latest version.
- [X] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
Describe the bug
不是想报bug,想请教一下InternVL论文中的第一阶段训练的问题
1.llama是 decoder-only 架构,这里第一阶段,img被internvit编码以后,是和 llama tokenizer的向量算的pairwise loss 吗?然后后续这个LLAMA是会直接被丢掉吧?现在QLLAMA
2.这部分数据应该就是和clip类似的图像文本对吧?如果想继续做SFT,现在有代码支持吗,我阅读了一下仓库代码,好像没找到,请问我需要怎么做?
Reproduction
None
Environment
None
Error traceback
None
对于1.的补充:text部分是相当于做了分词之后,然后丢给llama拿到隐藏状态然后作为词嵌入对吗?
你好,
- QLLaMA继承了第一阶段 LLaMA-7B的权重. 具体第二阶段的数据可以参考论文https://arxiv.org/pdf/2312.14238;
- 继续做sft请参考https://internvl.readthedocs.io/en/latest/internvl2.0/finetune.html