LangRepo
LangRepo copied to clipboard
The evaluation on EgoSchema
Thank you for open-sourcing this excellent work. I recently evaluated your model on the EgoSchema dataset using the code provided in your Readme but achieved an accuracy of 50.8%. However, according to your paper, the expected result is 60.8%. Could you help me identify what might be wrong with my evaluation process? The command I used is as follows:
for infer process:
CUDA_VISIBLE_DEVICES=4,5,6,7 python main_repo.py --model ./ckpt/Mistral-7B-Instruct-v0.2 --text_encode clip --dataset egoschema --output_base_path output/egoschema/rep --output_filename m7b_rephrase_egoschema.json --num_examples_to_run -1 --task sum --prompt_type rephrase_sum_mistral --num_iterations 1 --num_chunks [4] --merge_ratio 0.25 --dst_stride 4 --num_words_in_rephrase 20 --num_words_in_sum 500 --read_scales [-1]
for evaluate process:
CUDA_VISIBLE_DEVICES=4,5,6,7 python main_ll_eval.py --model ./ckpt/Mistral-7B-Instruct-v0.2 --dataset egoschema --output_base_path output/egoschema --output_filename m7b_lleval_egoschema.json --data_path output/egoschema/rep/m7b_rephrase_egoschema_data.json --num_examples_to_run -1 --prompt_type qa_ll_mistral
the result i got in m7b_lleval_egoschema.json: `{ "num_total": 500, "num_valids": 500, "num_corrects": 254, "acc": 0.508, "data": { ........
`