How to evaluate other tasks(benchmarks)

Open JunseokLee98 opened this issue 9 months ago • 1 comments

Thank you for sharing your code. Although sh lmms_eval_magma.sh was executed, I was only able to evaluate textvqa benchmark despite I changed name of the argument eval_tasks. Could you please let me know how to evaluate other benchmarks? My shell script sh lmms_eval_magma.sh is below.

eval_tasks=${1:-textvqa}
NUM_PROCESSES=${2:-4}
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
export CUDA_VISIBLE_DEVICES=4,5,6

nohup python3 -m accelerate.commands.launch --num_processes=$NUM_PROCESSES -m lmms_eval --model magma --model_args pretrained="microsoft/Magma-8B" \
    --tasks $eval_tasks --batch_size 1 --log_samples --log_samples_suffix magma_textvqa --output_path ./logs/ &

Mar 05 '25 09:03 JunseokLee98

Interesting, it worked on my end. I evaluated many benchmarks.

Mar 05 '25 17:03 jwyang

Oh, there is some misunderstanding on arguments. I think it would be helpful to add contents on README, so I sent #PR33 please review the contents.

Mar 07 '25 06:03 JunseokLee98