unsloth AttributeError: torch._dynamo.config.vocab

I can load the model using below code:

from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "/root/private_data/models/Meta-Llama-3.1-70B-Instruct"

model = AutoModelForCausalLM.from_pretrained(model_id, device_map='auto',load_in_4bit=True,attn_implementation="flash_attention_2")
tokenizer = AutoTokenizer.from_pretrained(model_id)

However, When I try to load the model using unsloth, it shows below error. Could you please tell me where was wrong?

from unsloth import FastLanguageModel
from transformers import TextStreamer
import re
from tqdm import tqdm

max_seq_length = 16384
dtype = None
load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "/root/private_data/models/Meta-Llama-3.1-70B-Instruct",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

FastLanguageModel.for_inference(model)
text_streamer = TextStreamer(tokenizer)


🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
==((====))==  Unsloth 2024.8: Fast Llama patching. Transformers = 4.44.0.
   \\   /|    GPU: NVIDIA A800 80GB PCIe LC. Max memory: 79.138 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.3.0+cu121. CUDA = 8.0. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.26.post1. FA2 = True]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Loading checkpoint shards: 100%
30/30 [08:33<00:00, 14.69s/it]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File /opt/conda/lib/python3.10/site-packages/torch/utils/_config_module.py:142, in ConfigModule.__getattr__(self, name)
    141 try:
--> 142     return self._config[name]
    143 except KeyError as e:
    144     # make hasattr() work properly

KeyError: 'vocab_size'

The above exception was the direct cause of the following exception:

AttributeError                            Traceback (most recent call last)
Cell In[1], line 10
      7 dtype = None
      8 load_in_4bit = True
---> 10 model, tokenizer = FastLanguageModel.from_pretrained(
     11     model_name = "/root/private_data/models/loras/writer_70b_v1_lora",
     12     max_seq_length = max_seq_length,
     13     dtype = dtype,
     14     load_in_4bit = load_in_4bit,
     15 )
     17 FastLanguageModel.for_inference(model)
     18 text_streamer = TextStreamer(tokenizer)

File /opt/conda/lib/python3.10/site-packages/unsloth/models/loader.py:272, in FastLanguageModel.from_pretrained(model_name, max_seq_length, dtype, load_in_4bit, token, device_map, rope_scaling, fix_tokenizer, trust_remote_code, use_gradient_checkpointing, resize_model_vocab, revision, *args, **kwargs)
    269     tokenizer_name = None
    270 pass
--> 272 model, tokenizer = dispatch_model.from_pretrained(
    273     model_name        = model_name,
    274     max_seq_length    = max_seq_length,
    275     dtype             = dtype,
    276     load_in_4bit      = load_in_4bit,
    277     token             = token,
    278     device_map        = device_map,
    279     rope_scaling      = rope_scaling,
    280     fix_tokenizer     = fix_tokenizer,
    281     model_patcher     = dispatch_model,
    282     tokenizer_name    = tokenizer_name,
    283     trust_remote_code = trust_remote_code,
    284     revision          = revision if not is_peft else None,
    285     *args, **kwargs,
    286 )
    288 if resize_model_vocab is not None:
    289     model.resize_token_embeddings(resize_model_vocab)

File /opt/conda/lib/python3.10/site-packages/unsloth/models/llama.py:1403, in FastLlamaModel.from_pretrained(model_name, max_seq_length, dtype, load_in_4bit, token, device_map, rope_scaling, fix_tokenizer, model_patcher, tokenizer_name, trust_remote_code, **kwargs)
   1393 tokenizer_name = model_name if tokenizer_name is None else tokenizer_name
   1394 tokenizer = load_correct_tokenizer(
   1395     tokenizer_name    = tokenizer_name,
   1396     model_max_length  = max_position_embeddings,
   (...)
   1400     fix_tokenizer     = fix_tokenizer,
   1401 )
-> 1403 model, tokenizer = patch_tokenizer(model, tokenizer)
   1404 model = model_patcher.post_patch(model)
   1406 # Patch up QKV / O and MLP

File /opt/conda/lib/python3.10/site-packages/unsloth/models/_utils.py:470, in patch_tokenizer(model, tokenizer)
    468     if len(check_pad_token) != 1:
    469         possible_pad_token = None
--> 470     if check_pad_token[0] >= config.vocab_size:
    471         possible_pad_token = None
    472 pass

File /opt/conda/lib/python3.10/site-packages/torch/utils/_config_module.py:145, in ConfigModule.__getattr__(self, name)
    142     return self._config[name]
    143 except KeyError as e:
    144     # make hasattr() work properly
--> 145     raise AttributeError(f"{self.__name__}.{name} does not exist") from e

AttributeError: torch._dynamo.config.vocab_size does not exist

Aug 07 '24 12:08 nyl199310

I'm getting this error as well! following the colab instructions, with the meta-llama/Meta-Llama-Llama3.1-8B-Instruct model. when doing it with the model from unsloth, it works fine

==((====))== Unsloth 2024.8: Fast Llama patching. Transformers = 4.43.2. \ /| GPU: NVIDIA A100-SXM4-40GB. Max memory: 39.564 GB. Platform = Linux. O^O/ _/ \ Pytorch: 2.3.1+cu121. CUDA = 8.0. CUDA Toolkit = 12.1. \ / Bfloat16 = TRUE. FA [Xformers = 0.0.26.post1. FA2 = False] "-____-" Free Apache license: http://github.com/unslothai/unsloth Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!

Loading checkpoint shards: 100% 4/4 [00:06<00:00, 1.32s/it]

KeyError Traceback (most recent call last)

/usr/local/lib/python3.10/dist-packages/torch/utils/_config_module.py in getattr(self, name) 141 try: --> 142 return self._config[name] 143 except KeyError as e:

KeyError: 'vocab_size'

The above exception was the direct cause of the following exception:

AttributeError Traceback (most recent call last)

4 frames

in <cell line: 23>() 21 ] # More models at https://huggingface.co/unsloth 22 ---> 23 model, tokenizer = FastLanguageModel.from_pretrained( 24 model_name = "meta-llama/Meta-Llama-3.1-8B-Instruct", 25 max_seq_length = max_seq_length,

/usr/local/lib/python3.10/dist-packages/unsloth/models/loader.py in from_pretrained(model_name, max_seq_length, dtype, load_in_4bit, token, device_map, rope_scaling, fix_tokenizer, trust_remote_code, use_gradient_checkpointing, resize_model_vocab, revision, *args, **kwargs) 270 pass 271 --> 272 model, tokenizer = dispatch_model.from_pretrained( 273 model_name = model_name, 274 max_seq_length = max_seq_length,

/usr/local/lib/python3.10/dist-packages/unsloth/models/llama.py in from_pretrained(model_name, max_seq_length, dtype, load_in_4bit, token, device_map, rope_scaling, fix_tokenizer, model_patcher, tokenizer_name, trust_remote_code, **kwargs) 1401 ) 1402 -> 1403 model, tokenizer = patch_tokenizer(model, tokenizer) 1404 model = model_patcher.post_patch(model) 1405

/usr/local/lib/python3.10/dist-packages/unsloth/models/_utils.py in patch_tokenizer(model, tokenizer) 468 if len(check_pad_token) != 1: 469 possible_pad_token = None --> 470 if check_pad_token[0] >= config.vocab_size: 471 possible_pad_token = None 472 pass

/usr/local/lib/python3.10/dist-packages/torch/utils/_config_module.py in getattr(self, name) 143 except KeyError as e: 144 # make hasattr() work properly --> 145 raise AttributeError(f"{self.name}.{name} does not exist") from e 146 147 def delattr(self, name):

AttributeError: torch._dynamo.config.vocab_size does not exist

Aug 07 '24 12:08 tristan279

I think it related to commit 8001d30. line 470 should have model.config instead of config only.

Aug 07 '24 13:08 yzdnaufan

@danielhanchen sorry for tagging, but I think it is breaking bugs bcs it prevent loading the model.

Aug 07 '24 13:08 yzdnaufan

this could be temporary fix if you use colab

!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git@bfe38e6ea8d3d7cf8ce9e37962de03c71c90cbe2" !pip install --no-deps "xformers<0.0.27" "trl<0.9.0" peft accelerate bitsandbytes

I installed specific commit before 8001d30

Aug 07 '24 13:08 yzdnaufan

I believe that this may not be a problem directly in unsloth but a problem with a dependency (another argument for pinned dependencies...). For me, even checking out https://github.com/unslothai/unsloth/tree/July-Llama-2024 fails with the above error. I did not have this problem with that version before.

Aug 07 '24 16:08 thegenerativegeneration

Whoops - my bad that's a bug! I accidentally forgot to put model.config instead of just config! Updated the main branch! For local installations, please update Unsloth via pip uninstall unsloth -y pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" (Colab and Kaggle just Disconnect and Delete Runtime)

Aug 07 '24 17:08 danielhanchen

Just wanted to confirm that the bug reported in this issue has been fixed. I pulled the latest changes from the main branch and tested the scenario. Everything is working perfectly now!

Thanks for the quick fix and the great work!

Aug 08 '24 07:08 yzdnaufan

issue resolved! Thanks for the fix ::)

Aug 08 '24 07:08 tristan279