accelerate Model not offloading to disk when RAM is full

System Info

accelerate                    0.18.0
bitsandbytes                  0.38.1
diffusers                     0.15.1

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
[ ] My own task or dataset (give details below)

Reproduction

Looks like accelerate is not offloading models to disk when RAM is occupied. Am I missing something?

Ran in machine with 16GB RAM

from transformers import AutoModelForCausalLM
import torch
checkpoint = "facebook/opt-6.7b"
model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto", offload_folder="offload", offload_state_dict = True, torch_dtype=torch.float16)

from transformers import AutoConfig, Blip2ForConditionalGeneration,Blip2Processor
import torch
model = Blip2ForConditionalGeneration.from_pretrained("Salesforce/blip2-flan-t5-xl",device_map="auto", offload_folder="offload_1", offload_state_dict = True, torch_dtype=torch.float16)

Expected behavior

As per the docs, shouldn't it offload the model to disk when the RAM is full?

Apr 25 '23 08:04 hari10599

Yes it does. Since you're not describing the problem you encountered, I'm not sure how we can help. You still need to have enough RAM to load the checkpoint shards (which are 10GB each).

Apr 25 '23 13:04 sgugger

Yes it does. Since you're not describing the problem you encountered, I'm not sure how we can help. You still need to have enough RAM to load the checkpoint shards (which are 10GB each).

Hmm, makes sense. And, one more question. I need to load say 10 stable diffusion model each ~5GB in 16GB RAM. Is it possible? Does Accelerate take care of moving model to RAM/GPU, when needed?

Apr 25 '23 15:04 hari10599

Yes but it will be very slow unless you have a very fast hard drive. You will also need to limit the RAM used by the first models (since Accelerate takes all what is available by default) with the max_memory argument.

Apr 25 '23 15:04 sgugger

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Jun 13 '23 15:06 github-actions[bot]

accelerate accelerate copied to clipboard

Model not offloading to disk when RAM is full

System Info

Information

Tasks

Reproduction

Expected behavior

accelerate
accelerate copied to clipboard