misska1

Results 12 comments of misska1

I am looking forward to your support for GPT-J-6B too!😄

I had the same issue. ====================START warmup==================== =========lightseq========= lightseq generating... Traceback (most recent call last): File "test/ls_bart.py", line 102, in main() File "test/ls_bart.py", line 83, in main warmup(tokenizer, ls_model, hf_model,...

> You can convert the HF checkpoints back to Megatron-DeepSpeed. See this (a bit hacky) script: https://gist.github.com/malteos/c194368594e16439c101b7bf27195fd1 @malteos Thank you for your answer! However , in your code ,I need...

> the initial topology conversion was written for BF16Optimizer, but here you use zero stage=1, which I haven't worked with, so I have no experience with this use-case. > >...

![截屏2022-09-13 下午7 28 26](https://user-images.githubusercontent.com/22286006/189889787-cb93bc0b-9ba8-4dd7-ab06-0ae59698596e.png) I can not build this tokenizer by rust and I use tokenizer=0.12.0 instead. Does this matters? @stas00

> Honestly I'm not sure as I wasn't part of the data team. I remember they said that most likely the normal tokenizer should work, but it might be safer...

> I think the first grad norms that are 0 are linked to overflow, so essentially we drop the batch and reduce the loss factor. So everything should be normal,...