blog icon indicating copy to clipboard operation
blog copied to clipboard

assisted model offload

Open manjeetbhati opened this issue 1 year ago • 1 comments

is there a library I could use to distribute model loading b/w gpu and cpu, I have gpu with 16gb memory and tried https://huggingface.co/blog/assisted-generation (the model upto 1.3b params works fine) but model 6.7b params and beyond fail to load due to large memory needed, is there a library that I could use to share the load b/w cpu and gpu?

manjeetbhati avatar Sep 07 '23 22:09 manjeetbhati

Check this doc from accelerate library. You can use big model inference directly by passing device_map in from_pretained if you are using transformers library !

SunMarc avatar Sep 12 '23 13:09 SunMarc