BERT-pytorch issues

Fix Bug in getting random line

Get random sentence for next sentence prediction task, random sentence should be get in random_file rather than original file to iterate.

Zenglinxiao

Erroneous Code

I believe there are something wrong here. Even though I have never get into this if-clause during experiment. https://github.com/codertimo/BERT-pytorch/blob/d10dc4f9d5a6f2ca74380f62039526eb7277c671/bert_pytorch/dataset/dataset.py#L17-L20 Are you trying to self-increase a NoneType? And I am wondering...

LeoLai930603

OOM error in cuda while passing large corpus of wikipedia text files ? how to manage big files to train

I have used this parameter > bert -c /home/ai/LM_fit/bert/bert_pytorch/dataset/wiki_arabic.txt -v > /home/ai/LM_fit/bert/bert_pytorch/dataset/wiki_vocab.small -o > /home/ai/LM_fit/bert/bert_pytorch/dataset/wiki_model_cpu -hs 240 -l 3 -a 3 -s 30 -b 8 > --on_memory False --with_cuda True...

MohamedLotfyElrefai

Add: tqdm total

Hello. Thank you for sharing your code. It's marvellous. When I build my own vocab file, I found tqdm does not show the progress bar. So I added some codes...

YongWookHa

Question about the loss of Masked LM

5

Thank you very much for this great contribution. I found the loss of masked LM didn't decrease when it reaches the value around 7. However, in the official tensorflow implementation,...

zhezhaoa

good first issue

Minor comment typo

BoPengGit

chooses 15% of token

1

From paper, it mentioned > Instead, the training data generator chooses 15% of tokens at random, e.g., in the sentence my > dog is hairy it chooses hairy. It means...

makcedward

good first issue

Is it possible to train BERT?

7

Is it possible to achieve the same result as the paper in short time? Well.. I don't have enough GPU & computation power to see the enough result as google...

codertimo

help wanted

question

Fix non-matching tensor with odd embedding size.

If `d_model` integer parameter is odd rather than even, then tensor size is not matching (by a difference of 1) on the left and right sides of the assignment: ```python...

zmy

Making Book Corpus

5

Building the same corpus with original paper. Please share your tips to preprocess and download the file. It would be great to share preprocessed data using dropbox or google drive...

codertimo

help wanted

BERT-pytorch
BERT-pytorch copied to clipboard

Metadata

Fix Bug in getting random line

Erroneous Code

OOM error in cuda while passing large corpus of wikipedia text files ? how to manage big files to train

Add: tqdm total

Question about the loss of Masked LM

Minor comment typo

chooses 15% of token

Is it possible to train BERT?

Fix non-matching tensor with odd embedding size.

Making Book Corpus

← Metadata

Owner

Metadata

BERT-pytorch BERT-pytorch copied to clipboard

Metadata

← Metadata

Owner

Metadata

BERT-pytorch
BERT-pytorch copied to clipboard