Grégory Châtel
Grégory Châtel
Hi! I think the information you are looking for is in the readme file: https://github.com/google-research/bert#learning-a-new-wordpiece-vocabulary
The last commit is from october :( Andrej is probably busy with a lot of other things. That's too bad, Arxiv Sanity is my main way to stay updated with...
Thanks a lot for taking the time to fix this issue! I will close this issue once both tabs are up and running.
I've been monitoring the top hype tab for the past few days and the results seem pretty strange. It is updating but the content seems wrong. For example today (2019-04-08)...
Hi @Franck-Dernoncourt, a few people are working on a [PyTorch version](https://github.com/huggingface/pytorch-openai-transformer-lm) of this code. We have recently added some classes that allow using this model for other tasks. You can...
A discussion about this code is happening in #11.
This part of the `xmb` is used for the learned positional encoding. An embedding vector is associated by the network to each position of the input and this vector is...
@teucer @BangLiu Have you tried even higher `lm_coef`? If you want to reduce overfitting, you may also want to give an additional task to complete to the network (multi-task learning)....
Have you tried plugging a `LMHead` to a `TransformerModel` and using `nn.CrossEntropyLoss` to train? If I am not missing any important concept that should do the trick.
When the information reaches the classification head, it has one vector of dimension `n_embd` associated to each position of each input. If you want to get a single prediction for...