Mayank Mishra
Mayank Mishra
Can you provide a bit more details? How have you launched the job? Is this a standalone job or a server deployment using the Makefile?
could be
could be due to large number of input tokens
its woking on my machine :) But thats weird, this shouldn't be read as a pipe operator. which shell are you using?
Hi, do_sample = true and top_k = 1 should be fine but the correct way to do it is just do_sample = False. This is weird. I don't this is...
I am unsure about OPT's compatibility with deepspeed. But if it works, you can simply pass `save_mp_checkpoint_path` parameter to init_inference method. This will create a pre-sharded fp16 version (assuming it...
Also watch out for https://github.com/huggingface/transformers-bloom-inference/pull/37
If you don't have memory constraints (number of GPUs), I will encourage you to use fp16 since it is faster. int8/int4 will be much faster once DeepSpeed starts supporting their...
This is a bug in DeepSpeed. Can you report it there? Also, fyi DS-inference doesn't work with pytorch 1.13.1 yet. I would suggest to fall back to 1.12.1
I am not really sure. Haven't seen this before but seems like CUDA is not able to compile some kernels in DeepSpeed. I am using CUDA 11.6 with 8x A100...