finetune-transformer-lm
finetune-transformer-lm copied to clipboard
Question about the shape of `X_train`
X_train = tf.placeholder(tf.int32, [n_batch_train, 2, n_ctx, 2])
xmb[:, :, :, 1] = np.arange(n_vocab+n_special, n_vocab+n_special+n_ctx)
why there is a channel of additional tokens?
Problem solved! This part of the xmb is used for the learned positional encoding. https://github.com/huggingface/pytorch-openai-transformer-lm/issues/12#issuecomment-401770634