LangRepo The evaluation on EgoSchema

The evaluation on EgoSchema

Open 921112343 opened this issue 8 months ago • 2 comments

Thank you for open-sourcing this excellent work. I recently evaluated your model on the EgoSchema dataset using the code provided in your Readme but achieved an accuracy of 50.8%. However, according to your paper, the expected result is 60.8%. Could you help me identify what might be wrong with my evaluation process? The command I used is as follows:

for infer process: CUDA_VISIBLE_DEVICES=4,5,6,7 python main_repo.py --model ./ckpt/Mistral-7B-Instruct-v0.2 --text_encode clip --dataset egoschema --output_base_path output/egoschema/rep --output_filename m7b_rephrase_egoschema.json --num_examples_to_run -1 --task sum --prompt_type rephrase_sum_mistral --num_iterations 1 --num_chunks [4] --merge_ratio 0.25 --dst_stride 4 --num_words_in_rephrase 20 --num_words_in_sum 500 --read_scales [-1]

for evaluate process: CUDA_VISIBLE_DEVICES=4,5,6,7 python main_ll_eval.py --model ./ckpt/Mistral-7B-Instruct-v0.2 --dataset egoschema --output_base_path output/egoschema --output_filename m7b_lleval_egoschema.json --data_path output/egoschema/rep/m7b_rephrase_egoschema_data.json --num_examples_to_run -1 --prompt_type qa_ll_mistral

the result i got in m7b_lleval_egoschema.json: `{ "num_total": 500, "num_valids": 500, "num_corrects": 254, "acc": 0.508, "data": { ........

Jun 14 '24 01:06 921112343

LangRepo LangRepo copied to clipboard

The evaluation on EgoSchema

LangRepo
LangRepo copied to clipboard