Xuandi FU comments

Repositories
Issues
Comments

Results 2 comments of


                                            Xuandi FU

LongBench v2 Leaderboard Submission Request: Qwen2.5-14B & Gemini2.0 Flash Experimental Results

Could you please share how you evaluated the Gemini-2.0-Flash-Exp model? specifically how we could truncate the model input and the decoding parameters used? We also evaluated the Gemini-2.0-Flash-Exp model and...

[Evaluation] Failed to reproduce Qwen2-VL-7B MMMU result

Seeing the same issue when trying to reproduce **Qwen2-VL-2B**'s results on MMMU - I got 38.78 on MMMU_val with VLMEvalKit which doesn't align with the result (41.1) in the paper....