[Help Wanted] the alignment with official accuracy in llama3.2-vision
Does the repo support this model yet? Thanks!
Hi @droidXrobot @shan23chen! This repo now supports Llama-3.2-11B/90B-Vision-Instruct, you can use it with the newest transformers version (>=4.45.0.dev0)! However, the evaluation results obtained based on the current repo do not match the official results, and after the hyperparameters and the system prompt are aligned, there is still the problem of more dropped accuracy (mainly for ai2d). Is there anyone willing to solve this problem?
Ref: https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct/blob/main/generation_config.json https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/eval_details.md
Actually, all my benchmark can not align same model as before...
THIS REPO UPDATE TOO QUICK... Many things might casued not alignment..
Actually, all my benchmark can not align same model as before...
THIS REPO UPDATE TOO QUICK... Many things might casued not alignment..
Would you please provide more information, such as the corresponding commit ID of the previous and current code you used for evaluation, as well as the model & benchmark you have evaluated?
As for user, we can not compare each commit to see what changed, it's your responsibility.
The current situation, all benchmark are drop almost can treat wrong evaluation, The changes I can observe is:
- The tsv file new generated;
- This operation not have before, I donkt now what is this:
and it is slow
- the metric now all lower, on all benchmarks, same model
- I dont know what changed inside the evalkit.
I even doubt is my training codebase got wrong, stuck me about 1 week,
afterwards, I relise, the evaluation pipeline was broken, the old model can not repeat the metric before.
Any suggestion?
As for user, we can not compare each commit to see what changed, it's your responsibility.
The current situation, all benchmark are drop almost can treat wrong evaluation, The changes I can observe is:
- The tsv file new generated;
- This operation not have before, I donkt now what is this:
and it is slow
- the metric now all lower, on all benchmarks, same model
- I dont know what changed inside the evalkit.
I even doubt is my training codebase got wrong, stuck me about 1 week,
afterwards, I relise, the evaluation pipeline was broken, the old model can not repeat the metric before.
Any suggestion?
At least, you need to provide some information so that we can help. Please tell me the model you are using, one / several benchmarks you are evaluating. If you cannot find out the initial commit you are using, please try to remember when you first use this codebase.
As for user, we can not compare each commit to see what changed, it's your responsibility.
The current situation, all benchmark are drop almost can treat wrong evaluation, The changes I can observe is:
- The tsv file new generated;
- This operation not have before, I donkt now what is this:
and it is slow
- the metric now all lower, on all benchmarks, same model
- I dont know what changed inside the evalkit.
I even doubt is my training codebase got wrong, stuck me about 1 week,
afterwards, I relise, the evaluation pipeline was broken, the old model can not repeat the metric before.
Any suggestion?
Same Here: https://github.com/open-compass/VLMEvalKit/issues/503#issuecomment-2404134873
Also, if you want to go further with this problem, maybe creating a new issue is a better idea. You problem is not related to the issue of llama-3.2.
As for user, we can not compare each commit to see what changed, it's your responsibility. The current situation, all benchmark are drop almost can treat wrong evaluation, The changes I can observe is:
- The tsv file new generated;
- This operation not have before, I donkt now what is this:
and it is slow
- the metric now all lower, on all benchmarks, same model
- I dont know what changed inside the evalkit.
I even doubt is my training codebase got wrong, stuck me about 1 week, afterwards, I relise, the evaluation pipeline was broken, the old model can not repeat the metric before. Any suggestion?
Same Here: #503 (comment)
Also, if you want to go further with this problem, maybe creating a new issue is a better idea. You problem is not related to the issue of llama-3.2.
@kennymckormick Same issue here too. #523 Could you please check this one? Thanks!
and it is slow
and it is slow