Mayank Mishra

Results 187 comments of Mayank Mishra

Can you provide a bit more details? How have you launched the job? Is this a standalone job or a server deployment using the Makefile?

could be due to large number of input tokens

its woking on my machine :) But thats weird, this shouldn't be read as a pipe operator. which shell are you using?

Hi, do_sample = true and top_k = 1 should be fine but the correct way to do it is just do_sample = False. This is weird. I don't this is...

I am unsure about OPT's compatibility with deepspeed. But if it works, you can simply pass `save_mp_checkpoint_path` parameter to init_inference method. This will create a pre-sharded fp16 version (assuming it...

Also watch out for https://github.com/huggingface/transformers-bloom-inference/pull/37

If you don't have memory constraints (number of GPUs), I will encourage you to use fp16 since it is faster. int8/int4 will be much faster once DeepSpeed starts supporting their...

This is a bug in DeepSpeed. Can you report it there? Also, fyi DS-inference doesn't work with pytorch 1.13.1 yet. I would suggest to fall back to 1.12.1

I am not really sure. Haven't seen this before but seems like CUDA is not able to compile some kernels in DeepSpeed. I am using CUDA 11.6 with 8x A100...