Mayank Mishra

Results 187 comments of Mayank Mishra

i only see 4 processes in the yaml ^^ you can always enable cpu offloading

I think this issue needs re-visiting @tjruwase . This is very much needed for a lot of transformer models

@TingchenFu The size mismatch looks a bit weird to me. I have not seen this before. The following is how I load it, its a bit unclean but it works...

`max_split_size_mb` won't work with deepspeed inference I think. This is only for pure pytorch native code.

@younesbelkada related issue that we had closed before: https://github.com/huggingface/transformers/issues/18809

I don't think thats the case. I will try to run this on my end :0

Hey, no specific reasons. Its mostly to ding into the code and the optimizations done by the DeepSpeed team. Is it not openly available?

Init inference is fine, its in forward @mrwyattii

@RezaYazdaniAminabadi @mrwyattii @jeffra https://github.com/bigcode-project/bigcode-inference-benchmark You can run ```shell sh scripts/run_batch_size.sh ds-inference-1b-bloom-fp16 ``` This will run BLOOM 1.3B (randomly initialized) using DS-inference in fp16 in batch sizes 1 to 128 (doubled...