accelerate icon indicating copy to clipboard operation
accelerate copied to clipboard

Model not offloading to disk when RAM is full

Open hari10599 opened this issue 2 years ago • 3 comments

System Info

accelerate                    0.18.0
bitsandbytes                  0.38.1
diffusers                     0.15.1

Information

  • [ ] The official example scripts
  • [X] My own modified scripts

Tasks

  • [ ] One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
  • [ ] My own task or dataset (give details below)

Reproduction

Looks like accelerate is not offloading models to disk when RAM is occupied. Am I missing something?

Ran in machine with 16GB RAM

from transformers import AutoModelForCausalLM
import torch
checkpoint = "facebook/opt-6.7b"
model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto", offload_folder="offload", offload_state_dict = True, torch_dtype=torch.float16)

from transformers import AutoConfig, Blip2ForConditionalGeneration,Blip2Processor
import torch
model = Blip2ForConditionalGeneration.from_pretrained("Salesforce/blip2-flan-t5-xl",device_map="auto", offload_folder="offload_1", offload_state_dict = True, torch_dtype=torch.float16)

Expected behavior

As per the docs, shouldn't it offload the model to disk when the RAM is full?

hari10599 avatar Apr 25 '23 08:04 hari10599

Yes it does. Since you're not describing the problem you encountered, I'm not sure how we can help. You still need to have enough RAM to load the checkpoint shards (which are 10GB each).

sgugger avatar Apr 25 '23 13:04 sgugger

Yes it does. Since you're not describing the problem you encountered, I'm not sure how we can help. You still need to have enough RAM to load the checkpoint shards (which are 10GB each).

Hmm, makes sense. And, one more question. I need to load say 10 stable diffusion model each ~5GB in 16GB RAM. Is it possible? Does Accelerate take care of moving model to RAM/GPU, when needed?

hari10599 avatar Apr 25 '23 15:04 hari10599

Yes but it will be very slow unless you have a very fast hard drive. You will also need to limit the RAM used by the first models (since Accelerate takes all what is available by default) with the max_memory argument.

sgugger avatar Apr 25 '23 15:04 sgugger

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Jun 13 '23 15:06 github-actions[bot]