Pohung Huang
Pohung Huang
> deepspeed-inference+int8 is being worked on, please give us a bit of time. > > As you discovered the ds-inference script, it'll be shortly updated to support int8. Looking forward...
> Hmm, its not working for me even within a single node with batch size = 1, 8x A100 80gb Same, CUDA illegal memory access error See if "NCCL WARN...
> Actually, I just tried running with larger batch sizes (16 and 32) and it doesn't run into the "CUDA illegal memory access" error (as I did with batch size=2)....
@RezaYazdaniAminabadi could you give some hint (where to get the doc) about "generate MP-sharded checkpoints"? So far we have only the 70 .bin files downloaded from huggingface. Do you mean...
@RezaYazdaniAminabadi Hi, just for your inference. I have tested [https://github.com/microsoft/DeepSpeed/pull/2196](https://github.com/microsoft/DeepSpeed/pull/2196) but it seems not resolving the issue of "illegal memory access" at our side.
> @pohunghuang-nctu can you confirm your cuda version? I was using 11.6 and getting the same issue. Using 11.3 resolved it for me. Please give it a try. Thanks @mayank31398...
> I only have a single node with 8 GPUS 80GB each. Are you using pipeline parallel across nodes? Does DS-inference support that? 1. DS-inference supports multi-node with no doubt....