MGNM icon indicating copy to clipboard operation
MGNM copied to clipboard

The selection among BATCH_SIZE, NEG_NUM and REAL_BATCH_SIZE

Open KevinXu-01 opened this issue 1 year ago • 1 comments

Dear author, Thank you for your remarkable work. When I tried to reproduce the results, I ran into the same problem as issue #5 . I first checked the model code, but seemed to find no errors. Then when I attempted to replace REAL_BATCH_SIZE with BATCH_SIZE in train_iter and test_iter, I found the shape[0] of self.mask in line 285 of model.py became 1536 (=BATCH_SIZE*(NEG_NUM + 1)). So I checked the data_itrator module and realized why REAL_BATCH_SIZE is proposed. mask_new = tf.slice(self.mask, [0, se_num], [batch_size, seq_len]) Now, the problem is that in the default setting where BATCH_SIZE = 256 and NEG_NUM = 5, the REAL_BATCH_SIZE should be 42, but 42 * 6 = 252, which does not equal to 256. So I suggested to select the combination of BATCH_SIZE, NEG_NUM and REAL_BATCH_SIZE where BATCH_SIZE mod (NEG_NUM + 1) === 0 and REAL_BATCH_SIZE = BATCH_SIZE // (NEG_NUM + 1) By changing NEG_NUM to be 3, in which case BATCH_SIZE = 256 and REAL_BATCH_SIZE = 64, I successfully began training.

KevinXu-01 avatar Aug 03 '23 05:08 KevinXu-01

Hello, thank you for your suggestion but corrected NEG_NUM to 3 with the following error: tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found. (0) Internal: Blas xGEMMBatched launch failed : a.shape=[256,100,100], b.shape=[256,100,16], m=100, n=16, k=100, batch_size=256 [[{{node MatMul_4}}]] [[add_7/_49]] (1) Internal: Blas xGEMMBatched launch failed : a.shape=[256,100,100], b.shape=[256,100,16], m=100, n=16, k=100, batch_size=256 [[{{node MatMul_4}}]] 0 successful operations. 0 derived errors ignored. You have the same problem? Thank you now!

AnluckyER avatar Nov 06 '23 03:11 AnluckyER