blog assisted model offload

assisted model offload

Open manjeetbhati opened this issue 1 year ago • 1 comments

is there a library I could use to distribute model loading b/w gpu and cpu, I have gpu with 16gb memory and tried https://huggingface.co/blog/assisted-generation (the model upto 1.3b params works fine) but model 6.7b params and beyond fail to load due to large memory needed, is there a library that I could use to share the load b/w cpu and gpu?

Sep 07 '23 22:09 manjeetbhati

Check this doc from accelerate library. You can use big model inference directly by passing device_map in from_pretained if you are using transformers library !

Sep 12 '23 13:09 SunMarc

blog blog copied to clipboard

assisted model offload

blog
blog copied to clipboard