Zhe Chen

Results 316 comments of Zhe Chen

Is there any error message when starting the model worker?

Hello, thank you for your attention. You can now deploy the InternVL2 model following this document: [https://internvl.readthedocs.io/en/latest/internvl2.0/deployment.html](https://internvl.readthedocs.io/en/latest/internvl2.0/deployment.html)

The automatic allocation scheme of device_map='auto' of transformers may not be reasonable, in which case you can try manually allocating GPU memory to achieve maximum utilization, for example: ```python device_map...

As far as I know, this phenomenon is very common when training large models. This reflects the overfitting of the model to the training set to some extent.

> Would two 4090s be enough? I have a 3090, thinking on getting another one. Hello, thank you for your attention. You can now deploy the InternVL2 model following this...

因为训练数据中很少有身份证的数据,所以处理的还不够好

训练的最大窗口是4096,推理时可以扩大到10k,测试过没问题。 如果是在demo上,可以通过调整Max output tokens来控制:

我觉得不需要重头预训练,4k训练的模型直接扩大到8k-10k没有大问题,如果想扩大到更大的长度,可能需要再用长数据做一下微调。 另外您可以试试我们最近发布的[Mini-InternVL-Chat-2B-V1-5](https://huggingface.co/OpenGVLab/Mini-InternVL-Chat-2B-V1-5)和[Mini-InternVL-Chat-4B-V1-5](https://huggingface.co/OpenGVLab/Mini-InternVL-Chat-4B-V1-5),这两个模型都是在8k长度下做的SFT。

4B的这个问题是Phi3语言模型本身的问题,因为Phi3的词表太小,对中文支持很烂。目前看下来完全没救,以后也会避免使用Phi3来训练模型