Grégory Châtel comments

Results 19 comments of


                                            Grégory Châtel

How to use my own additional vocabulary dictionary?

Hi! I think the information you are looking for is in the readme file: https://github.com/google-research/bert#learning-a-new-wordpiece-vocabulary

Top hype not refreshing

The last commit is from october :( Andrej is probably busy with a lot of other things. That's too bad, Arxiv Sanity is my main way to stay updated with...

Top hype not refreshing

Thanks a lot for taking the time to fix this issue! I will close this issue once both tabs are up and running.

Top hype not refreshing

I've been monitoring the top hype tab for the past few days and the results seem pretty strange. It is updating but the content seems wrong. For example today (2019-04-08)...

Any timeline to release the code to train the LM + finetune on the other 11 tasks?

Hi @Franck-Dernoncourt, a few people are working on a [PyTorch version](https://github.com/huggingface/pytorch-openai-transformer-lm) of this code. We have recently added some classes that allow using this model for other tasks. You can...

what is the use of dropout in the Transformer?

A discussion about this code is happening in #11.

In `transform_roc`, why do we need `xmb[:, :, :, 1] `?

This part of the `xmb` is used for the learned positional encoding. An embedding vector is associated by the network to each position of the input and this vector is...

Avoid model overfitting

@teucer @BangLiu Have you tried even higher `lm_coef`? If you want to reduce overfitting, you may also want to give an additional task to complete to the network (multi-task learning)....

Instructions on how to train a language model from scratch

Have you tried plugging a `LMHead` to a `TransformerModel` and using `nn.CrossEntropyLoss` to train? If I am not missing any important concept that should do the trick.

Can someone explain this line?

When the information reaches the classification head, it has one vector of dimension `n_embd` associated to each position of each input. If you want to get a single prediction for...