Mayank Mishra
Mayank Mishra
i only see 4 processes in the yaml ^^ you can always enable cpu offloading
I think this issue needs re-visiting @tjruwase . This is very much needed for a lot of transformer models
@TingchenFu The size mismatch looks a bit weird to me. I have not seen this before. The following is how I load it, its a bit unclean but it works...
this is weird. Ill look into this one
`max_split_size_mb` won't work with deepspeed inference I think. This is only for pure pytorch native code.
@younesbelkada related issue that we had closed before: https://github.com/huggingface/transformers/issues/18809
I don't think thats the case. I will try to run this on my end :0
Hey, no specific reasons. Its mostly to ding into the code and the optimizations done by the DeepSpeed team. Is it not openly available?
Init inference is fine, its in forward @mrwyattii
@RezaYazdaniAminabadi @mrwyattii @jeffra https://github.com/bigcode-project/bigcode-inference-benchmark You can run ```shell sh scripts/run_batch_size.sh ds-inference-1b-bloom-fp16 ``` This will run BLOOM 1.3B (randomly initialized) using DS-inference in fp16 in batch sizes 1 to 128 (doubled...