On the use of --rope_scaling in InternVL2
Hi,
I'm trying to extend the context lenght, which is 8192, of InternVL2 models. I'm fine-tuning a InternVL2-2B on a custom dataset with LoRA using --rope_scaling dynamic and --max_lenght 32000.
It is my understanding (maybe I'm just wrong) that rope scaling purpose is to interpolate positional embeddings allowing to extend the context lenght of a model.
The training process looks good but the generated model, after LoRA merging and inference using the code provided here output this warning: "Token indices sequence length is longer than the specified maximum sequence length for this model (26333 > 8192). Running this sequence through the model will result in indexing errors" as if the model is still only capable to manage a length of 8192, even if fine-tuned with longer examples and with aforementioned command line options. Thank You.
same problem...
This is weird, because this model's max_length is never 8192, check this file:https://huggingface.co/OpenGVLab/InternVL2-2B/blob/main/config.json And I cannot reproduce this problem... However, I found our code cannot fit multi-modal models' rope-scaling well, this is fixed just now by #1612