AttributeError: torch._dynamo.config.vocab_size does not exist
I can load the model using below code:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "/root/private_data/models/Meta-Llama-3.1-70B-Instruct"
model = AutoModelForCausalLM.from_pretrained(model_id, device_map='auto',load_in_4bit=True,attn_implementation="flash_attention_2")
tokenizer = AutoTokenizer.from_pretrained(model_id)
However, When I try to load the model using unsloth, it shows below error. Could you please tell me where was wrong?
from unsloth import FastLanguageModel
from transformers import TextStreamer
import re
from tqdm import tqdm
max_seq_length = 16384
dtype = None
load_in_4bit = True
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "/root/private_data/models/Meta-Llama-3.1-70B-Instruct",
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
)
FastLanguageModel.for_inference(model)
text_streamer = TextStreamer(tokenizer)
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
==((====))== Unsloth 2024.8: Fast Llama patching. Transformers = 4.44.0.
\\ /| GPU: NVIDIA A800 80GB PCIe LC. Max memory: 79.138 GB. Platform = Linux.
O^O/ \_/ \ Pytorch: 2.3.0+cu121. CUDA = 8.0. CUDA Toolkit = 12.1.
\ / Bfloat16 = TRUE. FA [Xformers = 0.0.26.post1. FA2 = True]
"-____-" Free Apache license: http://github.com/unslothai/unsloth
Loading checkpoint shards: 100%
30/30 [08:33<00:00, 14.69s/it]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File /opt/conda/lib/python3.10/site-packages/torch/utils/_config_module.py:142, in ConfigModule.__getattr__(self, name)
141 try:
--> 142 return self._config[name]
143 except KeyError as e:
144 # make hasattr() work properly
KeyError: 'vocab_size'
The above exception was the direct cause of the following exception:
AttributeError Traceback (most recent call last)
Cell In[1], line 10
7 dtype = None
8 load_in_4bit = True
---> 10 model, tokenizer = FastLanguageModel.from_pretrained(
11 model_name = "/root/private_data/models/loras/writer_70b_v1_lora",
12 max_seq_length = max_seq_length,
13 dtype = dtype,
14 load_in_4bit = load_in_4bit,
15 )
17 FastLanguageModel.for_inference(model)
18 text_streamer = TextStreamer(tokenizer)
File /opt/conda/lib/python3.10/site-packages/unsloth/models/loader.py:272, in FastLanguageModel.from_pretrained(model_name, max_seq_length, dtype, load_in_4bit, token, device_map, rope_scaling, fix_tokenizer, trust_remote_code, use_gradient_checkpointing, resize_model_vocab, revision, *args, **kwargs)
269 tokenizer_name = None
270 pass
--> 272 model, tokenizer = dispatch_model.from_pretrained(
273 model_name = model_name,
274 max_seq_length = max_seq_length,
275 dtype = dtype,
276 load_in_4bit = load_in_4bit,
277 token = token,
278 device_map = device_map,
279 rope_scaling = rope_scaling,
280 fix_tokenizer = fix_tokenizer,
281 model_patcher = dispatch_model,
282 tokenizer_name = tokenizer_name,
283 trust_remote_code = trust_remote_code,
284 revision = revision if not is_peft else None,
285 *args, **kwargs,
286 )
288 if resize_model_vocab is not None:
289 model.resize_token_embeddings(resize_model_vocab)
File /opt/conda/lib/python3.10/site-packages/unsloth/models/llama.py:1403, in FastLlamaModel.from_pretrained(model_name, max_seq_length, dtype, load_in_4bit, token, device_map, rope_scaling, fix_tokenizer, model_patcher, tokenizer_name, trust_remote_code, **kwargs)
1393 tokenizer_name = model_name if tokenizer_name is None else tokenizer_name
1394 tokenizer = load_correct_tokenizer(
1395 tokenizer_name = tokenizer_name,
1396 model_max_length = max_position_embeddings,
(...)
1400 fix_tokenizer = fix_tokenizer,
1401 )
-> 1403 model, tokenizer = patch_tokenizer(model, tokenizer)
1404 model = model_patcher.post_patch(model)
1406 # Patch up QKV / O and MLP
File /opt/conda/lib/python3.10/site-packages/unsloth/models/_utils.py:470, in patch_tokenizer(model, tokenizer)
468 if len(check_pad_token) != 1:
469 possible_pad_token = None
--> 470 if check_pad_token[0] >= config.vocab_size:
471 possible_pad_token = None
472 pass
File /opt/conda/lib/python3.10/site-packages/torch/utils/_config_module.py:145, in ConfigModule.__getattr__(self, name)
142 return self._config[name]
143 except KeyError as e:
144 # make hasattr() work properly
--> 145 raise AttributeError(f"{self.__name__}.{name} does not exist") from e
AttributeError: torch._dynamo.config.vocab_size does not exist
I'm getting this error as well! following the colab instructions, with the meta-llama/Meta-Llama-Llama3.1-8B-Instruct model. when doing it with the model from unsloth, it works fine
==((====))== Unsloth 2024.8: Fast Llama patching. Transformers = 4.43.2. \ /| GPU: NVIDIA A100-SXM4-40GB. Max memory: 39.564 GB. Platform = Linux. O^O/ _/ \ Pytorch: 2.3.1+cu121. CUDA = 8.0. CUDA Toolkit = 12.1. \ / Bfloat16 = TRUE. FA [Xformers = 0.0.26.post1. FA2 = False] "-____-" Free Apache license: http://github.com/unslothai/unsloth Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Loading checkpoint shards: 100%  4/4 [00:06<00:00,  1.32s/it]
KeyError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/torch/utils/_config_module.py in getattr(self, name) 141 try: --> 142 return self._config[name] 143 except KeyError as e:
KeyError: 'vocab_size'
The above exception was the direct cause of the following exception:
AttributeError Traceback (most recent call last)
4 frames
/usr/local/lib/python3.10/dist-packages/unsloth/models/loader.py in from_pretrained(model_name, max_seq_length, dtype, load_in_4bit, token, device_map, rope_scaling, fix_tokenizer, trust_remote_code, use_gradient_checkpointing, resize_model_vocab, revision, *args, **kwargs) 270 pass 271 --> 272 model, tokenizer = dispatch_model.from_pretrained( 273 model_name = model_name, 274 max_seq_length = max_seq_length,
/usr/local/lib/python3.10/dist-packages/unsloth/models/llama.py in from_pretrained(model_name, max_seq_length, dtype, load_in_4bit, token, device_map, rope_scaling, fix_tokenizer, model_patcher, tokenizer_name, trust_remote_code, **kwargs) 1401 ) 1402 -> 1403 model, tokenizer = patch_tokenizer(model, tokenizer) 1404 model = model_patcher.post_patch(model) 1405
/usr/local/lib/python3.10/dist-packages/unsloth/models/_utils.py in patch_tokenizer(model, tokenizer) 468 if len(check_pad_token) != 1: 469 possible_pad_token = None --> 470 if check_pad_token[0] >= config.vocab_size: 471 possible_pad_token = None 472 pass
/usr/local/lib/python3.10/dist-packages/torch/utils/_config_module.py in getattr(self, name) 143 except KeyError as e: 144 # make hasattr() work properly --> 145 raise AttributeError(f"{self.name}.{name} does not exist") from e 146 147 def delattr(self, name):
AttributeError: torch._dynamo.config.vocab_size does not exist
I think it related to commit 8001d30. line 470 should have model.config instead of config only.
@danielhanchen sorry for tagging, but I think it is breaking bugs bcs it prevent loading the model.
this could be temporary fix if you use colab
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git@bfe38e6ea8d3d7cf8ce9e37962de03c71c90cbe2" !pip install --no-deps "xformers<0.0.27" "trl<0.9.0" peft accelerate bitsandbytes
I installed specific commit before 8001d30
I believe that this may not be a problem directly in unsloth but a problem with a dependency (another argument for pinned dependencies...). For me, even checking out https://github.com/unslothai/unsloth/tree/July-Llama-2024 fails with the above error. I did not have this problem with that version before.
Whoops - my bad that's a bug! I accidentally forgot to put model.config instead of just config! Updated the main branch! For local installations, please update Unsloth via pip uninstall unsloth -y pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" (Colab and Kaggle just Disconnect and Delete Runtime)
Just wanted to confirm that the bug reported in this issue has been fixed. I pulled the latest changes from the main branch and tested the scenario. Everything is working perfectly now!
Thanks for the quick fix and the great work!
issue resolved! Thanks for the fix ::)