BERT4Rec icon indicating copy to clipboard operation
BERT4Rec copied to clipboard

Code on Ml-20m can't achieve the same performance as in paper

Open outside-BUPT opened this issue 4 years ago • 4 comments

my result: hit@1:0.2399, ndcg@5:0.393, hit@5:0.536, ndcg@10:0.4379=, hit@10:0.6718, ap:0.3794

result from paper: hit@1:0.3440 , ndcg@5:0.4967, hit@5:0.6323, ndcg@10:0.5340, hit@10:0.7473, ap:0.4785

outside-BUPT avatar Aug 20 '21 07:08 outside-BUPT

@outside-BUPT how many steps did you train the model?

asash avatar Dec 31 '21 22:12 asash

I tried to run run_ml-1m.sh and also got bad evaluation performance as below. I noticed the script sets train steps to 400000.

dcg@1:0.016556291390728478, hit@1:0.016556291390728478, ndcg@5:0.041968522744744094, hit@5:0.06837748344370861, ndcg@10:0.06059810508379426, hit@10:0.12682119205298012, ap:0.06324608749550562, valid_user:6040.0 INFO:tensorflow:Inference Time : 27.27113s I1007 17:31:45.042668 140571207984832 evaluation.py:269] Inference Time : 27.27113s INFO:tensorflow:Finished evaluation at 2023-10-07-17:31:45 I1007 17:31:45.043014 140571207984832 evaluation.py:271] Finished evaluation at 2023-10-07-17:31:45 INFO:tensorflow:Saving dict for global step 400000: global_step = 400000, loss = 9.357097, masked_lm_accuracy = 0.0013245033, masked_lm_loss = 9.3572

oscarriddle avatar Oct 07 '23 10:10 oscarriddle

@oscarriddle The default number of training steps in this script needs to be higher to reproduce the reported result. If you want to get closer to the reported results, you'll have to increase the number of steps by a factor of 20 (e. g. 8M or even more), but the training will take a lot of time.

See our reproducibility paper on this https://browse.arxiv.org/pdf/2207.07483.pdf

asash avatar Oct 07 '23 10:10 asash

@oscarriddle The default number of training steps in this script needs to be higher to reproduce the reported result. If you want to get closer to the reported results, you'll have to increase the number of steps by a factor of 20 (e. g. 8M or even more), but the training will take a lot of time.

See our reproducibility paper on this https://browse.arxiv.org/pdf/2207.07483.pdf

After training about 35,000,000 steps, still wandering around a rather low saddle point, merely above SASrec's performance mentioned in paper. ............................................................ndcg@1:0.24486754966887417, hit@1:0.24486754966887417, ndcg@5:0.4067822076254277, hit@5:0.5536423841059602, ndcg@10:0.44381509752787934, hit@10:0.6673841059602649, ap:0.38591140681367797, valid_user:6040.0 ............................................................ndcg@1:0.24519867549668875, hit@1:0.24519867549668875, ndcg@5:0.40733902240485165, hit@5:0.5538079470198676, ndcg@10:0.44420200049712466, hit@10:0.666887417218543, ap:0.3865876645860639, valid_user:6040.0 ............................................................ndcg@1:0.24486754966887417, hit@1:0.24486754966887417, ndcg@5:0.4067814130590064, hit@5:0.5524834437086092, ndcg@10:0.4443041962387484, hit@10:0.6675496688741722, ap:0.3864707774135114, valid_user:6040.0 ............................................................ndcg@1:0.24519867549668875, hit@1:0.24519867549668875, ndcg@5:0.40754205560187584, hit@5:0.5539735099337748, ndcg@10:0.4447855702971012, hit@10:0.6683774834437086, ap:0.38679013575800086, valid_user:6040.0 ............................................................ndcg@1:0.24437086092715232, hit@1:0.24437086092715232, ndcg@5:0.40725909975358604, hit@5:0.5543046357615894, ndcg@10:0.4443552628363676, hit@10:0.6683774834437086, ap:0.38621019782555793, valid_user:6040.0 ............................................................ndcg@1:0.24370860927152319, hit@1:0.24370860927152319, ndcg@5:0.406638924762195, hit@5:0.5533112582781456, ndcg@10:0.4437388075143209, hit@10:0.6670529801324503, ap:0.385885343545252, valid_user:6040.0 ............................................................ndcg@1:0.24503311258278146, hit@1:0.24503311258278146, ndcg@5:0.40708202926577325, hit@5:0.5533112582781456, ndcg@10:0.4445961973538064, hit@10:0.6685430463576159, ap:0.3864957036428388, valid_user:6040.0 ............................................................ndcg@1:0.2455298013245033, hit@1:0.2455298013245033, ndcg@5:0.4077146645047479, hit@5:0.5543046357615894, ndcg@10:0.4448187866218854, hit@10:0.6683774834437086, ap:0.3868417302898083, valid_user:6040.0 ............................................................ndcg@1:0.24519867549668875, hit@1:0.24519867549668875, ndcg@5:0.40756194630460774, hit@5:0.5543046357615894, ndcg@10:0.444561832998447, hit@10:0.6678807947019868, ap:0.386672691543328, valid_user:6040.0 ............................................................ndcg@1:0.24420529801324503, hit@1:0.24420529801324503, ndcg@5:0.40684643995653613, hit@5:0.5536423841059602, ndcg@10:0.44364797730057237, hit@10:0.6663907284768212, ap:0.3860289918791995, valid_user:6040.0 ............................................................ndcg@1:0.24486754966887417, hit@1:0.24486754966887417, ndcg@5:0.4070607410574634, hit@5:0.5531456953642384, ndcg@10:0.4442357483707688, hit@10:0.666887417218543, ap:0.3866187084025226, valid_user:6040.0

oscarriddle avatar Oct 13 '23 02:10 oscarriddle