Mayank Mishra
Mayank Mishra
Hmm, @thomasw21 so, the PR I referred to above uses both HF accelerate and DS-inference libraries, depending on what we want to infer with. But it does require transformers version...
@KMFODA currently, I am planning to create a standalone library. For now, I am adding to this repo itself.
@thomasw21 , I am not sure how this differs from the PR I pointed above ^^. Can you explain?
oh, I think I understand the issue now. Maybe something like loading from the universal checkpoints and running inference etc?
@pohunghuang-nctu can you confirm your cuda version? I was using 11.6 and getting the same issue. Using 11.3 resolved it for me. Please give it a try. Thanks
@pohunghuang-nctu I have PyTorch installed using conda (with CUDA 11.3) and DeepSpeed and apex have been installed from master branch using CUDA 11.3
I haven't played around that much with it. But batch size >1 is working for me.
I only have a single node with 8 GPUS 80GB each. Are you using pipeline parallel across nodes? Does DS-inference support that?
@pohunghuang-nctu @pai4451 thanks for letting me know about the multi-node deployment. I am guessing this would be using pipeline parallelism? However, what are the advantages of using multi-node during inference?...
I built deepspeed from source (master branch). Also, transformers is 4.20 transformers (4.21.1) installed using pip