Great job, questions about the results

Open yanghu819 opened this issue 1 year ago • 1 comments

I run

python train.py --digit --fix_src --dataset gsm8k --steps 120000 --weights_path /huyang/r1/diffusion-of-thoughts/plaid1b_weights/

python evaluation_batch.py --weights_path outputs/gsm8k-bs16-fix_src-digit-steps120000 --fix_src --digit --dataset gsm8k --score_temp 0.5

the result is [2025-02-24 13:14:58,570] total: 1319, corr: 68, acc: 0.05155420773313116 [2025-02-24 13:14:58,570] time: 315.3894371986389s [2025-02-24 13:14:58,571] Mean: 0.05155420773313116, Std: 0.0

Am I doing right? Thank you so much for checking the issue

Feb 24 '25 05:02 yanghu819

I find acc: 0.05 is due to my imcomplete training data, after using the right gsm8k, the result is a lot better, but still have some issues.

the train and eval code are as: python train.py --digit --fix_src --dataset gsm8k --steps 120000 --weights_path /huyang/r1/diffusion-of-thoughts/plaid1b_weights/

python evaluation_batch.py --weights_path outputs/gsm8k-bs16-fix_src-digit-steps120000 --fix_src --digit --dataset gsm8k --score_temp 0.5

the final result is acc: 0.19863532979529946. It can't achieve the paper result 32.6

Feb 27 '25 11:02 yanghu819