Junseong Kim comments

Results 46 comments of


Junseong Kim

pred_loss decrease fast while avg_acc stay at 50%

[bert-small-25-logs.txt](https://github.com/codertimo/BERT-pytorch/files/2517258/bert-small-25-logs.txt) This is the result of my 1M corpus with 1epoch, anyone can review this result

pred_loss decrease fast while avg_acc stay at 50%

@yangze01 just default params with batch size 128

pred_loss decrease fast while avg_acc stay at 50%

I know but the line size of my corpus is usually less the 10 for each sentence. And seq_len should be properly set by the user. I don't think it's...

pred_loss decrease fast while avg_acc stay at 50%

@wenhaozheng-nju I did negative sampling https://github.com/codertimo/BERT-pytorch/blob/0d076e09fd5aef1601654fa0abfc2c7f0d57e5d9/bert_pytorch/dataset/dataset.py#L92-L99 https://github.com/codertimo/BERT-pytorch/blob/0d076e09fd5aef1601654fa0abfc2c7f0d57e5d9/bert_pytorch/dataset/dataset.py#L114-L125

pred_loss decrease fast while avg_acc stay at 50%

@wenhaozheng-nju hmmm but do you think it's the main problem of this issue? I guess it's a model problem.

pred_loss decrease fast while avg_acc stay at 50%

@wenhaozheng-nju Then do you think if i change the negative sampling code as you requested, than this issue could be figure it out?

pred_loss decrease fast while avg_acc stay at 50%

@jiqiujia Looks pretty awesome!! Can you share the full training logs using file? And how much big is your corpus?? I would like to know the details. Thank you for...

pred_loss decrease fast while avg_acc stay at 50%

@jiqiujia I trained my dataset for 10hours last night, with dropout rate 0.0 (which is same with no dropout) and dropout rate 0.1. Unfortunately, both test loss was not coveraged.

pred_loss decrease fast while avg_acc stay at 50%

@Kosuke-Szk Thank you for sharing your result with us. After I saw @Kosuke-Szk 's result, I thought *"Isn't our model is pretty small to train..?"* As you guys know, we...

pred_loss decrease fast while avg_acc stay at 50%

@wangwei7175878 WOW this are brilliant, this is really huge step for us. Thank you for your effort and computation resource. Is there any result which used the `weigth_decay` with default?...