HiTUT Reproduced results are very poor

Hello @594zyc, I want to leverage HiTUT as my research baseline but I can not reproduce the performance of your pretrained weight. Here are my results:

	&nbspValid seen		Valid unseen
	SR (PW)	GC (PW)	SR (PW)	GC (PW)
Pretrained	30.12(16.81)	39.73(22.72)	14.13(6.85)	26.18(12.78)
Reproduce 1 (8 epochs)	10.73(4.73)	18.97(10.52)	4.26(1.69)	15.19(6.23)
Reproduce 2 (early stop, 9 epochs)	16.83(8.87)	24.99 (14.49)	4.87 (2.12)	13.73 (6.60)

The training command is the same as metioned in README. Did I miss any important details?

Great thanks!

Oct 28 '22 10:10 RavenKiller

What was the command you used for evaluation? In particular, did you set --max_high_fails 9?

Nov 13 '22 02:11 594zyc

@594zyc Thanks for your response. Here are my evaluation commands:

python models/eval/eval_mmt.py --eval_path exp/mymodel/noskip_lr_mix_all_E-xavier768d_L12_H768_det-sep_dp0.1_di0.1_step_lr5e-05_0.999_type_sd999 --ckpt model_best_seen.pth --gpu --max_high_fails 9 --max_fails 10 --eval_split valid_seen --eval_enable_feat_posture --num_threads 4 --name_temp eval_valid_seen
python models/eval/eval_mmt.py --eval_path exp/mymodel/noskip_lr_mix_all_E-xavier768d_L12_H768_det-sep_dp0.1_di0.1_step_lr5e-05_0.999_type_sd999 --ckpt model_best_unseen.pth --gpu --max_high_fails 9 --max_fails 10 --eval_split valid_unseen --eval_enable_feat_posture --num_threads 4 --name_temp eval_valid_unseen

The --max_high_fails was set to 9.

Nov 15 '22 03:11 RavenKiller