misska1 comments

Results 12 comments of


misska1

TensorFlow 2.3.0, Python 3.8, CUDA 11, CuDNN 8, Compute 6.1, Ubuntu 20.04

config=cuda is what?

[Inference] Support GPT-J-6B

I am looking forward to your support for GPT-J-6B too!😄

[inference] RuntimeError: CUBLAS_STATUS_NOT_SUPPORTED on cards compute capability greater than 80

Have you fix this issue?

[inference] RuntimeError: CUBLAS_STATUS_NOT_SUPPORTED on cards compute capability greater than 80

How to fix this?

[inference] RuntimeError: CUBLAS_STATUS_NOT_SUPPORTED on cards compute capability greater than 80

I had the same issue. ====================START warmup==================== =========lightseq========= lightseq generating... Traceback (most recent call last): File "test/ls_bart.py", line 102, in main() File "test/ls_bart.py", line 83, in main warmup(tokenizer, ls_model, hf_model,...

Question about downloading checkpoints of 6.3B，2.5B，1.3B

> You can convert the HF checkpoints back to Megatron-DeepSpeed. See this (a bit hacky) script: https://gist.github.com/malteos/c194368594e16439c101b7bf27195fd1 @malteos Thank you for your answer! However , in your code ,I need...

grad norm increase strangely

> the initial topology conversion was written for BF16Optimizer, but here you use zero stage=1, which I haven't worked with, so I have no experience with this use-case. > >...

grad norm increase strangely

![截屏2022-09-13 下午7 28 26](https://user-images.githubusercontent.com/22286006/189889787-cb93bc0b-9ba8-4dd7-ab06-0ae59698596e.png) I can not build this tokenizer by rust and I use tokenizer=0.12.0 instead. Does this matters? @stas00

grad norm increase strangely

> Honestly I'm not sure as I wasn't part of the data team. I remember they said that most likely the normal tokenizer should work, but it might be safer...

grad norm increase strangely

> I think the first grad norms that are 0 are linked to overflow, so essentially we drop the batch and reduce the loss factor. So everything should be normal,...