[๐BUG] Batch evaluation does not realised
Describe the bug While I was training an algorithm I found that evaluation batch size is 1 in fact, so it does not depend on batch size that I set
To Reproduce Steps to reproduce the behavior: python3 run_recbole.py --model=BPR --dataset=MY_CUSTOM_DATA --config_files=config_for_test_general_BPR.yaml
What received
...
Training Hyper Parameters:
epochs = 100
train_batch_size = 16384
learner = adam
learning_rate = 0.001
train_neg_sample_args = {'distribution': 'uniform', 'sample_num': 1, 'alpha': 1.0, 'dynamic': False, 'candidate_num': 0}
eval_step = 10
stopping_step = 3
clip_grad_norm = None
weight_decay = 0.0
loss_decimal_place = 4
...
Evaluation Hyper Parameters:
eval_args = {'split': {'RS': [0.8, 0.1, 0.1]}, 'order': 'TO', 'group_by': 'user', 'mode': {'valid': 'full', 'test': 'full'}}
repeatable = False
metrics = ['Recall', 'MRR', 'NDCG', 'Hit', 'Precision']
topk = [10]
valid_metric = NDCG@10
valid_metric_bigger = True
eval_batch_size = 16384
metric_decimal_place = 4
...
The number of users: 95882
Average actions of users: 36.13249757511916
The number of items: 50149
Average actions of items: 69.08391162160007
The number of inters: 3464420
...
Train 0: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 171/171 [00:15<00:00, 11.13it/s]
....
Evaluate : 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 95881/95881 [04:19<00:00, 369.70it/s]
we see that 95881 during evaluation equals to 'The number of users', hovever during training it is obvious that batch is indeed really high (16384)
Desktop (please complete the following information):
- OS: Linux
- RecBole Version 1.1.1
- Python Version 3.10.6
- PyTorch Version 2.0.1
I see in some answers to issues (like here https://github.com/RUCAIBox/RecBole/issues/1866) that you significantly increase the value of eval batch size (like 1000 times). I tried this technique and received
train_batch_size = 16384 eval_batch_size = 163840000 The number of users: 173932 Average actions of users: 39.204006186361255 The number of items: 75004 Average actions of items: 90.91359012306174 The number of inters: 6818792
and for this parameters Train 0: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 337/337 [01:15<00:00, 4.47it/s] Evaluate : 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 80/80 [01:01<00:00, 1.31it/s]
I observe that on evaluation stage now only 80 batches. Could you please clarify why is it so, I do not understand how this number (80) appear
@SergeyPetrakov Hello!
Actually,the real eval_batch_size = eval_batch_size(that you set)// item_number,because each test case needs to be scored with all items. Each user has item_number's data.
Therefore๏ผwhen you set eval_batch_size = 163840000, eval_batch_size = 163840000 // 75004 (2184).
So this number(80) (173932 // 2184 ) appears.
Great! Thank you for the answer!
@SergeyPetrakov Hello! Actually,the real
eval_batch_size = eval_batch_size(that you set)// item_number,because each test case needs to be scored with all items. Each user has item_number's data. Therefore๏ผwhen you seteval_batch_size = 163840000,eval_batch_size = 163840000 // 75004(2184). So this number(80) (173932 // 2184 ) appears.
@TayTroye If real eval_batch_size = eval_batch_size(that you set)// item_number = 0 ( eg 128 // 1000 )๏ผWhat is the value of the real eval_batch_size at this time?
Thanks!
The value of the real eval_batch_size will be 1 at this time. You can check it in our code:https://github.com/RUCAIBox/RecBole/blob/00c018ed4458c20edf1d62ffc7f5f956ea5d3d42/recbole/data/dataloader/general_dataloader.py#L244
https://github.com/RUCAIBox/RecBole/blob/00c018ed4458c20edf1d62ffc7f5f956ea5d3d42/recbole/data/dataloader/general_dataloader.py#L244-L253 @TayTroye Thank you for your reply.
I use SASRec, a sequencial recommender. self.is_sequential == True
I'm very confused. following๏ผ
dataset Steam, downloaded from processed dataset : https://drive.google.com/drive/folders/1ahiLmzU7cGRPXf5qGMqtAChte2eYp9gI
09 Oct 16:59 INFO steam
The number of users: 2567539
Average actions of users: 1.1852385436943873
The number of items: 14431
Average actions of items: 210.8901593901594
The number of inters: 3043145
The sparsity of the dataset: 99.99178686104865%
Remain Fields: ['user_id', 'product_id', 'timestamp']
3 cases:
eval_args.mode: pop100
train_batch_size: 128
eval_batch_size: 128
==>
Train : 100%|โโโโโโโโโโโโโโโโโโโโโ| 2972/2972
Evaluate : 100%|โโโโโโโโโโโโโโโโโโโ| 36586/36586
eval_args.mode: pop100
train_batch_size: 128
eval_batch_size: 1280
==>
Train : 100%|โโโโโโโโโโโโโโโโโโโโโ| 2972/2972
Evaluate : 100%|โโโโโโโโโโโโโโโโโโโโโ| 3049/3049
eval_args.mode: pop100
train_batch_size: 128
eval_batch_size: 4096
==>
Train : 100%|โโโโโโโโโโโโโโโโโโโโโ| 2972/2972
Evaluate : 100%|โโโโโโโโโโโโโโโโโโโโโโโ| 915/915
I dont know how to compute out the nums of Evaluate. e.g. 36586, 3049, 915.
I really need your help, thank you!
When you set mode: pop100 , the batch_size and step will be set like this
https://github.com/RUCAIBox/RecBole/blob/00c018ed4458c20edf1d62ffc7f5f956ea5d3d42/recbole/data/dataloader/general_dataloader.py#L120
@TayTroye thx
ๆจๅฅฝ๏ผๆ็้ฎ้ข็กฎๅฎๅจๆจๆๅฐ็ไปฃ็ ไธญๅฏไปฅ่ฏดๆ๏ผๆไปๆไธไบ้ฎ้ขใ
ไธบไปไน๏ผ่ฟ้่ฆ self.uid2items_num * self.times ็ไบไธไธ self.times็ๆฅๆบ๏ผeg. pop100๏ผ CEๆฏ101 ,BPR 100;
uid2items_num ไธ้ขๆฏไธไธช้ฟๅบฆไธบuser_num ็ๆฐ็ป๏ผ่ฟๆฎตไปฃ็ ่ฟๆ ทๅ็้ป่พๆฏไปไน๏ผๆๆไบไธๆ็ฝใ
่ฟๆๅฐฑๆฏ๏ผ ไธบไปไนไธๆฏๆ่ฎพ็ฝฎไบeval_batch_size ๅ๏ผ็ดๆฅ็จ้ช่ฏ้ๆ ทๆฌๆปๆฐ / eval_batch_size ?
ๅฅฝๅๆ ๆณๅพๆนไพฟ็ๆงๅถ real eval_batch_size ๅคงๅฐใ