transformers icon indicating copy to clipboard operation
transformers copied to clipboard

device_map='"auto" fails with in big_modelling.py

Open lukolszewski opened this issue 3 years ago • 5 comments

System Info

An Ubuntu 20.04 Linux on a Ryzen 7 3900 CPU, 32GB RAM with a Nvidia RTX3070 GPU, a M2 SSD with plenty of free space.

Latest version of mkl, cpu only pytorch, transformers and accelerate in a freshly created venv.

Who can help?

@LysandreJik, @Narsil Sorry no person specified for Transformers or Accelerate libs.

Information

  • [ ] The official example scripts
  • [X] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [X] My own task or dataset (give details below)

Reproduction

Clone bloom-560m data with git into a directory. In this example the directory is /media/Data/ai/bloom-560.

Run the following python file:

from transformers import AutoTokenizer, AutoModel, AutoModelForCausalLM
import torch
from time import time
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("/media/Data/ai/bloom/bloom-560m")
model = AutoModelForCausalLM.from_pretrained("/media/Data/ai/bloom/bloom-560m",device_map="auto",torch_dtype=torch.float32)

pipe = pipeline('text-generation',model=model, tokenizer=tokenizer, torch_dtype=torch.float32)

def local_inf(prompt, temperature=0.7, top_p=None, max_new_tokens=32, repetition_penalty=None, do_sample=False, num_return_sequences=1):  
    response = pipe(f"{prompt}", 
    temperature = temperature, # 0 to 1
    top_p = top_p, # None, 0-1
    max_new_tokens = max_new_tokens, # up to 2047 theoretically
    return_full_text = False, # include prompt or not.
    repetition_penalty = repetition_penalty, # None, 0-100 (penalty for repeat tokens.
    do_sample = do_sample, # True: use sampling, False: Greedy decoding.
    num_return_sequences = num_return_sequences)
    return print(prompt + response[0]['generated_text']), response[0]['generated_text']

inp = """# Use OpenCV in Python"""
t = time()
resp = local_inf(inp, max_new_tokens=64)
delta = time() - t
print("Inference took %0.2f s." % delta)

An error shown below results.

File "/home/luk/dev/ai/bloom-560-cpu-testing/test1.py", line 12, in <module>
    model = AutoModelForCausalLM.from_pretrained("/media/Data/ai/bloom/bloom-560m",device_map="auto",torch_dtype=torch.float32)
  File "/home/luk/dev/.env_mlk_2022/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 446, in from_pretrained
    return model_class.from_pretrained(pretrained_model_name_or_path, *model_args, config=config, **kwargs)
  File "/home/luk/dev/.env_mlk_2022/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2179, in from_pretrained
    dispatch_model(model, device_map=device_map, offload_dir=offload_folder)
  File "/home/luk/dev/.env_mlk_2022/lib/python3.8/site-packages/accelerate/big_modeling.py", line 215, in dispatch_model
    main_device = [d for d in device_map.values() if d not in ["cpu", "disk"]][0]
IndexError: list index out of range

Expected behavior

It is expected the code runs fine and infers the model. However it runs only when the workaround described below is implemented.

Additional information below:

This seems to happen because line 215 in accelerate/big_modeling.py doesn't want to select cpu as the main device when set to auto. This is the relevant bit:

 if main_device is None:
        main_device = [d for d in device_map.values() if d not in ["cpu", "disk"]][0]

While transformers/modeling_utils.py line 2179 calls it like this:

if device_map is not None:
            dispatch_model(model, device_map=device_map, offload_dir=offload_folder)

With no obvious way to specify the main_device.

The problem can be worked around by changing line 2179 of modelling_utils.py temporarily to: dispatch_model(model, device_map=device_map, offload_dir=offload_folder,main_device='cpu')

Or in big_modelling.py line 215 can be changed to: main_device = [d for d in device_map.values() if d not in ["disk"]][0]

"disk" shouldn't be the main device, but the cpu seems to be a perfectly acceptable alternative in absence of GPU.

lukolszewski avatar Aug 19 '22 17:08 lukolszewski

Hi,

which version of transformers + accelerate are you using ?

Narsil avatar Aug 22 '22 09:08 Narsil

cc @muellerzr, have you seen something similar in the past?

LysandreJik avatar Aug 24 '22 09:08 LysandreJik

transformers version 4.21.1 accelerate version 0.12.0

lukolszewski avatar Aug 27 '22 08:08 lukolszewski

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Sep 20 '22 15:09 github-actions[bot]

There is no support for using the CPU as a main device in Accelerate yet. If you want to use the model on CPU, just don't specific device_map="auto".

Not quite sure why your GPU is not visible to torch since you mention having an RTX3070, but that's the crux of the issue here. Maybe it does not have enough RAM available to host the largest layer of the model?

sgugger avatar Sep 20 '22 15:09 sgugger

Thank you for the reply and the info CPU is not supported as the main device. Yes, I do have an RTX GPU. However it is not visible, because I wanted to run on CPU only. There is value in being able to use CPU as the main device due to it (usually)having much larger continuous RAM region available.

Would not setting device_map="auto" stop offloading data to disk altogether? In any case, hopefully running on CPU as the main device is supported in future in the meantime one can use the workaround as specified or one can not use device_map="auto" as you mentioned.

lukolszewski avatar Sep 27 '22 21:09 lukolszewski

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Oct 22 '22 15:10 github-actions[bot]

Note that we just merge support for dveice_map="auto" to work on a CPU-only env. Disk offload when executing on CPU might not work yet, but if your model fits into RAM, you won't have this error anymore (requires an install form source of Accelerate).

sgugger avatar Oct 31 '22 16:10 sgugger

device_map="auto" ——> device_map={"": "cpu"}

miandai avatar Jul 22 '23 15:07 miandai