transformers
transformers copied to clipboard
device_map='"auto" fails with in big_modelling.py
System Info
An Ubuntu 20.04 Linux on a Ryzen 7 3900 CPU, 32GB RAM with a Nvidia RTX3070 GPU, a M2 SSD with plenty of free space.
Latest version of mkl, cpu only pytorch, transformers and accelerate in a freshly created venv.
Who can help?
@LysandreJik, @Narsil Sorry no person specified for Transformers or Accelerate libs.
Information
- [ ] The official example scripts
- [X] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [X] My own task or dataset (give details below)
Reproduction
Clone bloom-560m data with git into a directory. In this example the directory is /media/Data/ai/bloom-560.
Run the following python file:
from transformers import AutoTokenizer, AutoModel, AutoModelForCausalLM
import torch
from time import time
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("/media/Data/ai/bloom/bloom-560m")
model = AutoModelForCausalLM.from_pretrained("/media/Data/ai/bloom/bloom-560m",device_map="auto",torch_dtype=torch.float32)
pipe = pipeline('text-generation',model=model, tokenizer=tokenizer, torch_dtype=torch.float32)
def local_inf(prompt, temperature=0.7, top_p=None, max_new_tokens=32, repetition_penalty=None, do_sample=False, num_return_sequences=1):
response = pipe(f"{prompt}",
temperature = temperature, # 0 to 1
top_p = top_p, # None, 0-1
max_new_tokens = max_new_tokens, # up to 2047 theoretically
return_full_text = False, # include prompt or not.
repetition_penalty = repetition_penalty, # None, 0-100 (penalty for repeat tokens.
do_sample = do_sample, # True: use sampling, False: Greedy decoding.
num_return_sequences = num_return_sequences)
return print(prompt + response[0]['generated_text']), response[0]['generated_text']
inp = """# Use OpenCV in Python"""
t = time()
resp = local_inf(inp, max_new_tokens=64)
delta = time() - t
print("Inference took %0.2f s." % delta)
An error shown below results.
File "/home/luk/dev/ai/bloom-560-cpu-testing/test1.py", line 12, in <module>
model = AutoModelForCausalLM.from_pretrained("/media/Data/ai/bloom/bloom-560m",device_map="auto",torch_dtype=torch.float32)
File "/home/luk/dev/.env_mlk_2022/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 446, in from_pretrained
return model_class.from_pretrained(pretrained_model_name_or_path, *model_args, config=config, **kwargs)
File "/home/luk/dev/.env_mlk_2022/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2179, in from_pretrained
dispatch_model(model, device_map=device_map, offload_dir=offload_folder)
File "/home/luk/dev/.env_mlk_2022/lib/python3.8/site-packages/accelerate/big_modeling.py", line 215, in dispatch_model
main_device = [d for d in device_map.values() if d not in ["cpu", "disk"]][0]
IndexError: list index out of range
Expected behavior
It is expected the code runs fine and infers the model. However it runs only when the workaround described below is implemented.
Additional information below:
This seems to happen because line 215 in accelerate/big_modeling.py doesn't want to select cpu as the main device when set to auto. This is the relevant bit:
if main_device is None:
main_device = [d for d in device_map.values() if d not in ["cpu", "disk"]][0]
While transformers/modeling_utils.py line 2179 calls it like this:
if device_map is not None:
dispatch_model(model, device_map=device_map, offload_dir=offload_folder)
With no obvious way to specify the main_device.
The problem can be worked around by changing line 2179 of modelling_utils.py temporarily to:
dispatch_model(model, device_map=device_map, offload_dir=offload_folder,main_device='cpu')
Or in big_modelling.py line 215 can be changed to:
main_device = [d for d in device_map.values() if d not in ["disk"]][0]
"disk" shouldn't be the main device, but the cpu seems to be a perfectly acceptable alternative in absence of GPU.
Hi,
which version of transformers + accelerate are you using ?
cc @muellerzr, have you seen something similar in the past?
transformers version 4.21.1 accelerate version 0.12.0
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
There is no support for using the CPU as a main device in Accelerate yet. If you want to use the model on CPU, just don't specific device_map="auto".
Not quite sure why your GPU is not visible to torch since you mention having an RTX3070, but that's the crux of the issue here. Maybe it does not have enough RAM available to host the largest layer of the model?
Thank you for the reply and the info CPU is not supported as the main device. Yes, I do have an RTX GPU. However it is not visible, because I wanted to run on CPU only. There is value in being able to use CPU as the main device due to it (usually)having much larger continuous RAM region available.
Would not setting device_map="auto" stop offloading data to disk altogether? In any case, hopefully running on CPU as the main device is supported in future in the meantime one can use the workaround as specified or one can not use device_map="auto" as you mentioned.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Note that we just merge support for dveice_map="auto" to work on a CPU-only env. Disk offload when executing on CPU might not work yet, but if your model fits into RAM, you won't have this error anymore (requires an install form source of Accelerate).
device_map="auto" ——> device_map={"": "cpu"}