MGNM
MGNM copied to clipboard
The selection among BATCH_SIZE, NEG_NUM and REAL_BATCH_SIZE
Dear author,
Thank you for your remarkable work.
When I tried to reproduce the results, I ran into the same problem as issue #5 . I first checked the model code, but seemed to find no errors.
Then when I attempted to replace REAL_BATCH_SIZE with BATCH_SIZE in train_iter and test_iter, I found the shape[0] of self.mask in line 285 of model.py became 1536 (=BATCH_SIZE*(NEG_NUM + 1)). So I checked the data_itrator module and realized why REAL_BATCH_SIZE is proposed.
mask_new = tf.slice(self.mask, [0, se_num], [batch_size, seq_len])
Now, the problem is that in the default setting where BATCH_SIZE = 256 and NEG_NUM = 5, the REAL_BATCH_SIZE should be 42, but 42 * 6 = 252, which does not equal to 256. So I suggested to select the combination of BATCH_SIZE, NEG_NUM and REAL_BATCH_SIZE where
BATCH_SIZE mod (NEG_NUM + 1) === 0
and
REAL_BATCH_SIZE = BATCH_SIZE // (NEG_NUM + 1)
By changing NEG_NUM to be 3, in which case BATCH_SIZE = 256 and REAL_BATCH_SIZE = 64, I successfully began training.
Hello, thank you for your suggestion but corrected NEG_NUM to 3 with the following error: tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found. (0) Internal: Blas xGEMMBatched launch failed : a.shape=[256,100,100], b.shape=[256,100,16], m=100, n=16, k=100, batch_size=256 [[{{node MatMul_4}}]] [[add_7/_49]] (1) Internal: Blas xGEMMBatched launch failed : a.shape=[256,100,100], b.shape=[256,100,16], m=100, n=16, k=100, batch_size=256 [[{{node MatMul_4}}]] 0 successful operations. 0 derived errors ignored. You have the same problem? Thank you now!