transformers
                                
                                 transformers copied to clipboard
                                
                                    transformers copied to clipboard
                            
                            
                            
                        Unable to Infer on Bloom Model-2b5 using Deepspeed
System Info
I was able to load the Bloom-2b5 Model onto my Colab notebook for text generation(Inference). When I use Deepspeed to load the model and try to inference the memory is not sufficient. I don't understand because with the help of deepspeed i should be able to load the larger model or atleast the same model.
Who can help?
No response
Information
- [ ] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the examplesfolder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)
Reproduction
- Load the 2b5 model- https://huggingface.co/bigscience/bloom-2b5
- Infer with and without deepspeed - https://huggingface.co/docs/transformers/main_classes/deepspeed
Expected behavior
Cuda error: Insufficient memory
Hey @Ravisankar13, could you please provide the script you have used, the deepspeed config, as well as the full stacktrace? It will be hard to help you with so little information.
cc @stas00
@Ravisankar13, please see the work-in-progress here: https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/308
You have a variety of different working solutions there.
we will soon move those here.
Thanks for your response. Let me try them and get back to you
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.