Junseong Kim comments

Results 46 comments of


                                            Junseong Kim

[BUG] Floating Point Exception (core dump) at launch_attn_softmax_v2<float>

@jeffra Could you check this issue? I think https://github.com/microsoft/DeepSpeed/pull/1899 made the issue.

[BUG] Floating Point Exception (core dump) at launch_attn_softmax_v2<float>

I print out the all variable in the function and found the main reason for this issue! ### 1. Zero-Division Error happened in this line. sequence_length is 32880 https://github.com/microsoft/DeepSpeed/blob/89e37ef360dddf10bed996734784e290b9b5fc62/csrc/transformer/inference/csrc/softmax.cu#L386 ###...

[BUG] DeepSpeed Inference with GPT-J using batches with padding gives wrong outputs

@RezaYazdaniAminabadi Hi! I met the same issue at GPT model which take the padded input_ids. Is there any update about this issue?

the format of input

I mean it could be **Two piece of one sentence** not actually real sentence. Well It doesn't matter both two sentences and one sentence. And this example is came out...

PositionalEmbedding

@Yang92to Great Point, I'll check out the BERT positional embedding method, and update ASAP

pred_loss decrease fast while avg_acc stay at 50%

Hmm interesting.. Is this the result of 0.0.1a4 version? And How did you guys print out that result?

pred_loss decrease fast while avg_acc stay at 50%

Hmmm... anyone have any clues?

pred_loss decrease fast while avg_acc stay at 50%

@cairoHy Wow thank you for your smart analysis. I just fixed this issue on [0.0.1a5](https://github.com/codertimo/BERT-pytorch/tree/alpha0.0.1a5) version branch. And changes is under here. https://github.com/codertimo/BERT-pytorch/blob/2a0b28218f4fde216cbb7750eb584c2ada0d487b/bert_pytorch/trainer/pretrain.py#L61-L62 https://github.com/codertimo/BERT-pytorch/blob/2a0b28218f4fde216cbb7750eb584c2ada0d487b/bert_pytorch/trainer/pretrain.py#L98-L102

pred_loss decrease fast while avg_acc stay at 50%

Thanks everyone who join this investigation :) It was totally my fault and sorry for your inconvenience during bug fixing. Additionally, is here anyone can test the new code with...

pred_loss decrease fast while avg_acc stay at 50%

@jiqiujia Can you tell me about the details? like figure or logs