InternVL icon indicating copy to clipboard operation
InternVL copied to clipboard

Why does InternVL3 use class_embedding in the code but discard it later?

Open siri-xr opened this issue 5 months ago • 2 comments

I noticed that InternVL3's code includes a class_embedding variable , but it seems to be discarded or unused in later stages. Could you clarify:

What was the original purpose of this class_embedding?

Why was it removed or left unused in the final implementation?

Are there plans to repurpose it in future updates?

siri-xr avatar Jul 16 '25 03:07 siri-xr

你说的代码在哪里,是这个吗 https://github.com/OpenGVLab/InternVL/blob/51ac0b1daf0589c00c760681470006768b396290/clip_benchmark/clip_benchmark/models/internvl_huggingface/modeling_internvl.py#L60

northeastsquare avatar Jul 19 '25 01:07 northeastsquare

This class_embedding is introduced during the vision-only pre-training stage, where it is used to compute the CLIP loss. We remove it to ensure compatibility with pixel shuffle, which compresses the visual tokens in the 2D space, making the class_embedding token redundant in this case.

Weiyun1025 avatar Sep 01 '25 17:09 Weiyun1025