Junseong Kim

Results 46 comments of Junseong Kim

@jeffra Could you check this issue? I think https://github.com/microsoft/DeepSpeed/pull/1899 made the issue.

I print out the all variable in the function and found the main reason for this issue! ### 1. Zero-Division Error happened in this line. sequence_length is 32880 https://github.com/microsoft/DeepSpeed/blob/89e37ef360dddf10bed996734784e290b9b5fc62/csrc/transformer/inference/csrc/softmax.cu#L386 ###...

@RezaYazdaniAminabadi Hi! I met the same issue at GPT model which take the padded input_ids. Is there any update about this issue?

I mean it could be **Two piece of one sentence** not actually real sentence. Well It doesn't matter both two sentences and one sentence. And this example is came out...

@Yang92to Great Point, I'll check out the BERT positional embedding method, and update ASAP

Hmm interesting.. Is this the result of 0.0.1a4 version? And How did you guys print out that result?

@cairoHy Wow thank you for your smart analysis. I just fixed this issue on [0.0.1a5](https://github.com/codertimo/BERT-pytorch/tree/alpha0.0.1a5) version branch. And changes is under here. https://github.com/codertimo/BERT-pytorch/blob/2a0b28218f4fde216cbb7750eb584c2ada0d487b/bert_pytorch/trainer/pretrain.py#L61-L62 https://github.com/codertimo/BERT-pytorch/blob/2a0b28218f4fde216cbb7750eb584c2ada0d487b/bert_pytorch/trainer/pretrain.py#L98-L102

Thanks everyone who join this investigation :) It was totally my fault and sorry for your inconvenience during bug fixing. Additionally, is here anyone can test the new code with...

@jiqiujia Can you tell me about the details? like figure or logs