Parameter selection for mm_newline_position
I see in the code that mm_newline_position can be selected as grid, one_token, frame, and no_token, what is the exact meaning of these parameters?
The position of mm_newline_position is to be inserted.
- Grid: Inserted after every row (the same as image modality)
- One_token: Inserted after all frames.
- Frame: Inserted after each frame.
- No_token: Do not insert.
Hi! Which one should I use if I want to reproduce the results of video tasks? Also, which one do you use when you train the model?
The position of
mm_newline_positionis to be inserted.
- Grid: Inserted after every row (the same as image modality)
- One_token: Inserted after all frames.
- Frame: Inserted after each frame.
- No_token: Do not insert.
what does inserted after every row mean? after doing so, each frame in the video has 13x14 grid but i don't know what does this mean,could you kindly explain why doing so?
I have the same question. I noticed that in llava-ov-7b the mm_newline_position is one_token, but in llava-vid-7b it is grid. How does this special token function in Llava, especially in Llava-video? I find that this token matters in token compression. I'd appreciate it if someone with relevant experience could tell me.
@cokeshao Have you solved this problem now? I also have the same question. Different settings are used on different models. If the settings are different from the original settings, will the results be very poor?
@cokeshao Have you solved this problem now? I also have the same question. Different settings are used on different models. If the settings are different from the original settings, will the results be very poor?@cokeshao 您现在解决了这个问题吗?我也有同样的问题。不同的模型使用不同的设置。如果设置与原始设置不同,结果会不会很差?
I am still trying to solve this problem, but I have no idea. I found that if Llava-video uses one_token, the result is much poorer. If you find something interesting, please contact me. lol