Great job, questions about the results
I run
python train.py --digit --fix_src --dataset gsm8k --steps 120000 --weights_path /huyang/r1/diffusion-of-thoughts/plaid1b_weights/
python evaluation_batch.py --weights_path outputs/gsm8k-bs16-fix_src-digit-steps120000 --fix_src --digit --dataset gsm8k --score_temp 0.5
the result is [2025-02-24 13:14:58,570] total: 1319, corr: 68, acc: 0.05155420773313116 [2025-02-24 13:14:58,570] time: 315.3894371986389s [2025-02-24 13:14:58,571] Mean: 0.05155420773313116, Std: 0.0
Am I doing right? Thank you so much for checking the issue
I find acc: 0.05 is due to my imcomplete training data, after using the right gsm8k, the result is a lot better, but still have some issues.
the train and eval code are as: python train.py --digit --fix_src --dataset gsm8k --steps 120000 --weights_path /huyang/r1/diffusion-of-thoughts/plaid1b_weights/
python evaluation_batch.py --weights_path outputs/gsm8k-bs16-fix_src-digit-steps120000 --fix_src --digit --dataset gsm8k --score_temp 0.5
the final result is acc: 0.19863532979529946. It can't achieve the paper result 32.6