LLaVA-NeXT icon indicating copy to clipboard operation
LLaVA-NeXT copied to clipboard

Parameter selection for mm_newline_position

Open wade0604 opened this issue 1 year ago • 6 comments

I see in the code that mm_newline_position can be selected as grid, one_token, frame, and no_token, what is the exact meaning of these parameters?

wade0604 avatar Sep 24 '24 04:09 wade0604

The position of mm_newline_position is to be inserted.

  1. Grid: Inserted after every row (the same as image modality)
  2. One_token: Inserted after all frames.
  3. Frame: Inserted after each frame.
  4. No_token: Do not insert.

ZhangYuanhan-AI avatar Sep 24 '24 08:09 ZhangYuanhan-AI

Hi! Which one should I use if I want to reproduce the results of video tasks? Also, which one do you use when you train the model?

ywh187 avatar Sep 29 '24 06:09 ywh187

The position of mm_newline_position is to be inserted.

  1. Grid: Inserted after every row (the same as image modality)
  2. One_token: Inserted after all frames.
  3. Frame: Inserted after each frame.
  4. No_token: Do not insert.

what does inserted after every row mean? after doing so, each frame in the video has 13x14 grid but i don't know what does this mean,could you kindly explain why doing so?

MATTbomerts avatar Mar 19 '25 02:03 MATTbomerts

I have the same question. I noticed that in llava-ov-7b the mm_newline_position is one_token, but in llava-vid-7b it is grid. How does this special token function in Llava, especially in Llava-video? I find that this token matters in token compression. I'd appreciate it if someone with relevant experience could tell me.

cokeshao avatar Apr 22 '25 07:04 cokeshao

@cokeshao Have you solved this problem now? I also have the same question. Different settings are used on different models. If the settings are different from the original settings, will the results be very poor?

qq1332427275 avatar Jun 11 '25 08:06 qq1332427275

@cokeshao Have you solved this problem now? I also have the same question. Different settings are used on different models. If the settings are different from the original settings, will the results be very poor?@cokeshao 您现在解决了这个问题吗?我也有同样的问题。不同的模型使用不同的设置。如果设置与原始设置不同,结果会不会很差?

I am still trying to solve this problem, but I have no idea. I found that if Llava-video uses one_token, the result is much poorer. If you find something interesting, please contact me. lol

cokeshao avatar Jun 11 '25 08:06 cokeshao