transformers
transformers copied to clipboard
OverflowError with device="mps" using dedicated GPU
System Info
- 2019 Mac Pro
- AMD Radeon Pro W5700X 16 GB
- macOS Ventura 13.3
transformers-cli env:
transformersversion: 4.27.4- Platform: macOS-10.16-x86_64-i386-64bit
- Python version: 3.9.16
- Huggingface_hub version: 0.13.3
- PyTorch version (GPU?): 2.1.0.dev20230403 (False)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: no
Who can help?
No response
Information
- [ ] The official example scripts
- [X] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
Shell:
conda create -n transformerstest
conda activate transformerstest
conda install -c huggingface transformers
conda install pytorch torchvision torchaudio -c pytorch-nightly
Python:
from transformers import pipeline
generator = pipeline("text-generation", device="mps")
generator("In this course, we will teach you how to")
The system is then compiling Metal shaders and doing something on the GPU, but the result is:
Traceback (most recent call last):
File "/Users/fabian/devel/transformers-course/test.py", line 4, in <module>
generator("In this course, we will teach you how to")
File "/usr/local/Caskroom/miniconda/base/envs/transformerstest/lib/python3.9/site-packages/transformers/pipelines/text_generation.py", line 209, in __call__
return super().__call__(text_inputs, **kwargs)
File "/usr/local/Caskroom/miniconda/base/envs/transformerstest/lib/python3.9/site-packages/transformers/pipelines/base.py", line 1109, in __call__
return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
File "/usr/local/Caskroom/miniconda/base/envs/transformerstest/lib/python3.9/site-packages/transformers/pipelines/base.py", line 1117, in run_single
outputs = self.postprocess(model_outputs, **postprocess_params)
File "/usr/local/Caskroom/miniconda/base/envs/transformerstest/lib/python3.9/site-packages/transformers/pipelines/text_generation.py", line 270, in postprocess
text = self.tokenizer.decode(
File "/usr/local/Caskroom/miniconda/base/envs/transformerstest/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 3476, in decode
return self._decode(
File "/usr/local/Caskroom/miniconda/base/envs/transformerstest/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 549, in _decode
text = self._tokenizer.decode(token_ids, skip_special_tokens=skip_special_tokens)
OverflowError: out of range integral type conversion attempted
Expected behavior
Generating output. This works on a MacBook Pro M1 with device="mps" (utilizing the GPU AFAICT) or on the Mac Pro without it (not utilizing GPU).
Thanks for your support!
This looks similar to #22529 and this is not a bug in Transformers but in PyTorch, so you will have to wait for them to release a fix.
Thanks for the quick answer!
Not holding my breath for a fix though. It's one out of 10K+ open issues in pytorch...
Thanks for the quick answer!
Not holding my breath for a fix though. It's one out of 10K+ open issues in pytorch...
Yeah that's the same issue. It just got marked high priority a few minutes ago so they're definitely looking at it.
In the meantime you can get it working if you make some manual fixes to your local copy of transformers. Not pretty, but it works.
In brief, I worked around it locally by searching <python-install>/lib/python3.X/site-packages/transformers for all references to argmax, and changing all relevant references such that X.argmax(...) is changed to X.max(...).indices. I think I changed it in 5 or 6 files total. Which references are relevant will depend on what you're doing. There's a ton of references under models/ but you'd only need to change the ones you might actually need. I'm currently only looking at Llama models and there were no calls to argmax under models/llama so I didn't change any files under models/.
If you want to try that I can send you a list of files I had to changed, relative to 4.28.0.dev0
Then you'd also need check your client code to see if it's making any of its own calls to torch.argmax, and change those too.
Finally, if you're using an Intel system with AMD GPU, then due to separate issue https://github.com/pytorch/pytorch/issues/92752 you also need to check for calls to torch.multinomial and rewrite those. There weren't any in transformers that affected me, but there was one in the client code I was using. I described how I changed that here: https://github.com/jankais3r/LLaMA_MPS/issues/14#issuecomment-1494959026 . Apparently Silicon systems aren't affected by this bug.
It's a bit of a mess at the moment due to those MPS bugs - but it is possible to get it working if you're willing to hack transformers and check your client code.
It just got marked high priority a few minutes ago so they're definitely looking at it.
I pinged the PyTorch team on it ;-)
Much appreciated!
Actually running LLaMa was my goal, I was just trying something simpler first.
Now I tried LLaMa using the following:
from transformers import AutoTokenizer, LlamaForCausalLM, pipeline
model = LlamaForCausalLM.from_pretrained("/path/to/models/llama-7b/")
tokenizer = AutoTokenizer.from_pretrained("/path/to/models/llama-7b/")
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, device="mps")
pipe("In this course, we will teach you how to")
Result:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Caskroom/miniconda/base/envs/textgen/lib/python3.10/site-packages/transformers/pipelines/text_generation.py", line 209, in __call__
return super().__call__(text_inputs, **kwargs)
File "/usr/local/Caskroom/miniconda/base/envs/textgen/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1109, in __call__
return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
File "/usr/local/Caskroom/miniconda/base/envs/textgen/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1117, in run_single
outputs = self.postprocess(model_outputs, **postprocess_params)
File "/usr/local/Caskroom/miniconda/base/envs/textgen/lib/python3.10/site-packages/transformers/pipelines/text_generation.py", line 270, in postprocess
text = self.tokenizer.decode(
File "/usr/local/Caskroom/miniconda/base/envs/textgen/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3485, in decode
return self._decode(
File "/usr/local/Caskroom/miniconda/base/envs/textgen/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 931, in _decode
filtered_tokens = self.convert_ids_to_tokens(token_ids, skip_special_tokens=skip_special_tokens)
File "/usr/local/Caskroom/miniconda/base/envs/textgen/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 912, in convert_ids_to_tokens
tokens.append(self._convert_id_to_token(index))
File "/usr/local/Caskroom/miniconda/base/envs/textgen/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py", line 119, in _convert_id_to_token
token = self.sp_model.IdToPiece(index)
File "/usr/local/Caskroom/miniconda/base/envs/textgen/lib/python3.10/site-packages/sentencepiece/__init__.py", line 1045, in _batched_func
return _func(self, arg)
File "/usr/local/Caskroom/miniconda/base/envs/textgen/lib/python3.10/site-packages/sentencepiece/__init__.py", line 1038, in _func
raise IndexError('piece id is out of range.')
IndexError: piece id is out of range.
Which sounds like "minus nine trillion something" indices happening somewhere again. I didn't find "multinomial" or "argmax" under models/llama, but it's possible of course that those functions are called somewhere else.
Which sounds like "minus nine trillion something" indices happening somewhere again. I didn't find "multinomial" or "argmax" under models/llama, but it's possible of course that those functions are called somewhere else.
Yes, it is not referenced anywhere under models/llama but is referenced multiple other places throughout transformers. In my earlier reply I described the process I followed to change those.
That test code works for me with my locally hacked copy of transformers.
Code:
from transformers import LlamaTokenizer, LlamaForCausalLM, pipeline
model = LlamaForCausalLM.from_pretrained("/Users/tomj/src/llama.cpp/models/llama-7b-HF")
tokenizer = LlamaTokenizer.from_pretrained("/Users/tomj/src/llama.cpp/models/llama-7b-HF")
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, device="mps")
print(pipe("In this course, we will teach you how to"))
Output:
tomj@Eddie ~/src $ ~/anaconda3/envs/torch21/bin/python ./test_llama.py
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [00:20<00:00, 1.61it/s]
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
The class this function is called from is 'LlamaTokenizer'.
/Users/tomj/anaconda3/envs/torch21/lib/python3.10/site-packages/transformers/generation/utils.py:1219: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
warnings.warn(
/Users/tomj/anaconda3/envs/torch21/lib/python3.10/site-packages/transformers/generation/utils.py:1313: UserWarning: Using `max_length`'s default (20) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.
warnings.warn(
[{'generated_text': 'In this course, we will teach you how to use the most popular and powerful tools in the industry'}]
Same error with torch nightly version: 2.1.0.dev20230428 and 'MPS' on a 2020 iMac 27" with an AMD Radeon 5700 XT gpu in
https://github.com/andreamad8/FSB
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.