Why not cite FastV?

Open MiloQ opened this issue 11 months ago • 1 comments

In my opinion, the part of LLava-Mini where the model shallowly integrates image information has a lot in common with FastV. Has the author looked into this article?

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models（https://arxiv.org/abs/2403.06764）

I don't think anyone studying VLM token compression has not read FastV

Jan 22 '25 02:01 MiloQ

@MiloQ @zhangshaolei1998

As the author of FastV( https://arxiv.org/abs/2403.06764 , ECCV 2024, appear on arxiv in 2024.03 ), I must express my serious concern regarding LLaVA-Mini. I've identified numerous "contributions" in their paper that appear identical to our work. What's particularly troubling is that LLaVA-Mini fails to cite our work despite these striking similarities.

For example, comparing Figure 4 in FastV with Figure 4 in LLaVA-Mini reveals identical token nomenclature and even the same color scheme. The resemblance is so precise that it suggests potential use of our codebase. This is particularly concerning given that LLaVA-Mini presents these elements as novel contributions.

Figure 4 in FastV

Figure 4 in LLaVA-Mini

I formally request that the LLaVA-Mini authors clarify these similarities and properly acknowledge our prior work, and remove the false claim in the paper.

Mar 02 '25 14:03 chenllliang