unsloth Qwen2-VL-72B 4Bit won't load

Qwen2-VL-72B 4Bit won't load

Open kozzy97 opened this issue 1 month ago • 18 comments

Hi!

I am having trouble loading the Qwen2-VL-72B models. Here's a minimal code snippet, which I run on a local machine using the standard conda install from the docs. I have tried with both a 32GB Tesla-V100 and an 80GB A100.

import torch
from trl import SFTTrainer, SFTConfig
from unsloth import FastVisionModel, is_bf16_supported # FastLanguageModel for LLMs
from unsloth.trainer import UnslothVisionDataCollator

model_hf_path = 'unsloth/Qwen2-VL-72B-Instruct-bnb-4bit'
model, tokenizer = FastVisionModel.from_pretrained(
        model_hf_path, 
        load_in_4bit = True, 
        use_gradient_checkpointing = "unsloth", 
    )

I get the following error:

Traceback (most recent call last):
  File "$HOME/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/huggingface_hub/utils/_http.py", line 406, in hf_raise_for_status
    response.raise_for_status()
  File "$HOME/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/api/models/unsloth/qwen2-vl-72b-instruct-unsloth-bnb-4bit

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "$HOME/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/huggingface_hub/hf_file_system.py", line 125, in _repo_and_revision_exist
    self._api.repo_info(
  File "$HOME/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "$HOME/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/huggingface_hub/hf_api.py", line 2748, in repo_info
    return method(
           ^^^^^^^
  File "$HOME/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "$HOME/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/huggingface_hub/hf_api.py", line 2533, in model_info
    hf_raise_for_status(r)
  File "$HOME/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/huggingface_hub/utils/_http.py", line 454, in hf_raise_for_status
    raise _format(RepositoryNotFoundError, message, response) from e
huggingface_hub.errors.RepositoryNotFoundError: 404 Client Error. (Request ID: Root=1-677bb76a-7e60f5736021adae14bf2f7a;410212d4-27c9-42bb-9c2d-4d27d7962042)

Repository Not Found for url: https://huggingface.co/api/models/unsloth/qwen2-vl-72b-instruct-unsloth-bnb-4bit.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "$HOME/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/models/loader.py", line 413, in from_pretrained
    files = HfFileSystem(token = token).glob(os.path.join(model_name, "*.json"))
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "$HOME/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/huggingface_hub/hf_file_system.py", line 520, in glob
    path = self.resolve_path(path, revision=kwargs.get("revision")).unresolve()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "$HOME/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/huggingface_hub/hf_file_system.py", line 216, in resolve_path
    _raise_file_not_found(path, err)
  File "$HOME/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/huggingface_hub/hf_file_system.py", line 1136, in _raise_file_not_found
    raise FileNotFoundError(msg) from err
FileNotFoundError: unsloth/qwen2-vl-72b-instruct-unsloth-bnb-4bit/*.json (repository not found)

Everything works perfectly with the 2B and 7B with these parameters. Furthermore, I do not get this error when I load the model with load_in_4bit = False. The model starts to load, but then I get the following error, which suggests that I am running out of RAM (even on an A100), so I would really like to load in 4bit!

Loading checkpoint shards:   0%|          | 0/31 [00:00<?, ?it/s]
Loading checkpoint shards:   3%|▎         | 1/31 [00:04<02:22,  4.75s/it]
Loading checkpoint shards:   6%|▋         | 2/31 [00:06<01:33,  3.22s/it]
Loading checkpoint shards:  10%|▉         | 3/31 [00:09<01:17,  2.77s/it]
Loading checkpoint shards:  13%|█▎        | 4/31 [00:11<01:07,  2.52s/it]
Loading checkpoint shards:  16%|█▌        | 5/31 [00:15<01:23,  3.22s/it]
Loading checkpoint shards:  19%|█▉        | 6/31 [00:20<01:37,  3.92s/it]
Loading checkpoint shards:  23%|██▎       | 7/31 [00:25<01:37,  4.06s/it]
Loading checkpoint shards:  26%|██▌       | 8/31 [00:29<01:35,  4.17s/it]
Loading checkpoint shards:  29%|██▉       | 9/31 [00:34<01:36,  4.40s/it]
Loading checkpoint shards:  32%|███▏      | 10/31 [00:38<01:30,  4.32s/it]
Loading checkpoint shards:  35%|███▌      | 11/31 [00:43<01:29,  4.47s/it]
Loading checkpoint shards:  39%|███▊      | 12/31 [00:49<01:30,  4.79s/it]
Loading checkpoint shards:  42%|████▏     | 13/31 [00:54<01:27,  4.86s/it]
Loading checkpoint shards:  45%|████▌     | 14/31 [00:59<01:25,  5.02s/it]
Loading checkpoint shards:  48%|████▊     | 15/31 [01:05<01:27,  5.45s/it]
Loading checkpoint shards:  52%|█████▏    | 16/31 [01:10<01:19,  5.29s/it]
Loading checkpoint shards:  55%|█████▍    | 17/31 [01:14<01:07,  4.84s/it]
Loading checkpoint shards:  58%|█████▊    | 18/31 [01:14<00:44,  3.43s/it]
Loading checkpoint shards:  61%|██████▏   | 19/31 [01:15<00:29,  2.45s/it]
Loading checkpoint shards:  65%|██████▍   | 20/31 [01:15<00:19,  1.76s/it]
Loading checkpoint shards:  68%|██████▊   | 21/31 [01:15<00:12,  1.28s/it]
Loading checkpoint shards:  71%|███████   | 22/31 [01:15<00:08,  1.06it/s]
Loading checkpoint shards:  74%|███████▍  | 23/31 [01:15<00:05,  1.41it/s]
Loading checkpoint shards:  77%|███████▋  | 24/31 [01:15<00:03,  1.81it/s]
Loading checkpoint shards:  81%|████████  | 25/31 [01:16<00:02,  2.28it/s]
Loading checkpoint shards:  84%|████████▍ | 26/31 [01:16<00:01,  2.78it/s]
Loading checkpoint shards:  87%|████████▋ | 27/31 [01:16<00:01,  3.27it/s]
Loading checkpoint shards:  90%|█████████ | 28/31 [01:16<00:00,  3.75it/s]
Loading checkpoint shards:  94%|█████████▎| 29/31 [01:16<00:00,  4.18it/s]
Loading checkpoint shards:  97%|█████████▋| 30/31 [01:16<00:00,  4.60it/s]
Loading checkpoint shards: 100%|██████████| 31/31 [01:17<00:00,  5.15it/s]
Loading checkpoint shards: 100%|██████████| 31/31 [01:17<00:00,  2.48s/it] 
Some parameters are on the meta device because they were offloaded to the cpu.
Traceback (most recent call last):
  File "./finetuning/src/finetune.py", line 106, in <module>
    finetune(args.llm, args.split, args.data_path, args.res_path)
  File "./finetuning/src/finetune.py", line 44, in finetune
    trainer = SFTTrainer(
              ^^^^^^^^^^^
  File "$HOME/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/trainer.py", line 203, in new_init
    original_init(self, *args, **kwargs)
  File "$HOME/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/transformers/utils/deprecation.py", line 165, in wrapped_func
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "$HOME/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/trl/trainer/sft_trainer.py", line 307, in __init__
    super().__init__(
  File "$HOME/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/transformers/utils/deprecation.py", line 165, in wrapped_func
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "$HOME/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/transformers/trainer.py", line 574, in __init__
    self._move_model_to_device(model, args.device)
  File "$HOME/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/transformers/trainer.py", line 846, in _move_model_to_device
    model = model.to(device)
            ^^^^^^^^^^^^^^^^
  File "$HOME/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1340, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "$HOMEminiconda3/envs/unsloth_env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 900, in _apply
    module._apply(fn)
  File "$HOME/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 900, in _apply
    module._apply(fn)
  File "$HOME/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 900, in _apply
    module._apply(fn)
  [Previous line repeated 5 more times]
  File "$HOME/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 927, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File "$HOME/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1333, in convert
    raise NotImplementedError(
NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.

Thanks in advance, and let me know if I can provide any further details.

Jan 06 '25 11:01 kozzy97

unsloth unsloth copied to clipboard

Qwen2-VL-72B 4Bit won't load

unsloth
unsloth copied to clipboard