LLaVA AttributeError: 'LlamaModel' object has no attribute 'vision

Whenever I run python -m llava.serve.model_worker --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path LLaVA-13B-v0 --multi-moda I get AttributeError: 'LlamaModel' object has no attribute 'vision_tower'

Apr 19 '23 11:04 Sequential-circuits

NOTE: In this research preview, we used a modified version of huggingface/transformers library to support multimodal models and the LLaMA tokenizer. Make sure that you are using the correct transformers library from https://github.com/haotian-liu/transformers_llava.

Apr 19 '23 14:04 152334H

I installed it and same error

Apr 19 '23 14:04 tekntrash

@Sequential-circuits @tekntrash Can you share the full error message here? This shall be solved by installing the correct transformers package.

Apr 19 '23 15:04 haotian-liu

(base) @.***:~/LLaVA# python -m llava.serve.model_worker --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path ./LLaVA-13B-v0 --multi-modal 2023-04-19 15:47:30 | INFO | model_worker | args: Namespace(host='localhost', port=40000, worker_address='http://localhost:40000', controller_address='http://localhost:10000', model_path='./LLaVA-13B-v0', model_name=None, multi_modal=True, keep_aspect_ratio=False, num_gpus=1, limit_model_concurrency=5, stream_interval=2, no_register=False) 2023-04-19 15:47:30 | INFO | model_worker | Loading the model LLaVA-13B-v0 on worker 6c1a9a ... 2023-04-19 15:47:30.703901: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-04-19 15:47:32 | INFO | numexpr.utils | Note: NumExpr detected 16 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8. 2023-04-19 15:47:32 | INFO | numexpr.utils | NumExpr defaulting to 8 threads. Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s] Loading checkpoint shards: 33%|███████████████████████ | 1/3 [00:05<00:10, 5.14s/it] Loading checkpoint shards: 67%|██████████████████████████████████████████████ | 2/3 [00:16<00:09, 9.05s/it] Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████| 3/3 [00:28<00:00, 10.20s/it] Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████| 3/3 [00:28<00:00, 9.50s/it] 2023-04-19 15:48:01 | ERROR | stderr | 2023-04-19 15:48:01 | ERROR | stderr | ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ 2023-04-19 15:48:01 | ERROR | stderr | │ /root/anaconda3/lib/python3.10/runpy.py:196 in _run_module_as_main │ 2023-04-19 15:48:01 | ERROR | stderr | │ │ 2023-04-19 15:48:01 | ERROR | stderr | │ 193 │ main_globals = sys.modules["main"].dict │ 2023-04-19 15:48:01 | ERROR | stderr | │ 194 │ if alter_argv: │ 2023-04-19 15:48:01 | ERROR | stderr | │ 195 │ │ sys.argv[0] = mod_spec.origin │ 2023-04-19 15:48:01 | ERROR | stderr | │ ❱ 196 │ return _run_code(code, main_globals, None, │ 2023-04-19 15:48:01 | ERROR | stderr | │ 197 │ │ │ │ │ "main", mod_spec) │ 2023-04-19 15:48:01 | ERROR | stderr | │ 198 │ 2023-04-19 15:48:01 | ERROR | stderr | │ 199 def run_module(mod_name, init_globals=None, │ 2023-04-19 15:48:01 | ERROR | stderr | │ │ 2023-04-19 15:48:01 | ERROR | stderr | │ /root/anaconda3/lib/python3.10/runpy.py:86 in _run_code │ 2023-04-19 15:48:01 | ERROR | stderr | │ │ 2023-04-19 15:48:01 | ERROR | stderr | │ 83 │ │ │ │ │ loader = loader, │ 2023-04-19 15:48:01 | ERROR | stderr | │ 84 │ │ │ │ │ package = pkg_name, │ 2023-04-19 15:48:01 | ERROR | stderr | │ 85 │ │ │ │ │ spec = mod_spec) │ 2023-04-19 15:48:01 | ERROR | stderr | │ ❱ 86 │ exec(code, run_globals) │ 2023-04-19 15:48:01 | ERROR | stderr | │ 87 │ return run_globals │ 2023-04-19 15:48:01 | ERROR | stderr | │ 88 │ 2023-04-19 15:48:01 | ERROR | stderr | │ 89 def _run_module_code(code, init_globals=None, │ 2023-04-19 15:48:01 | ERROR | stderr | │ │ 2023-04-19 15:48:01 | ERROR | stderr | │ /root/LLaVA/llava/serve/model_worker.py:361 in │ 2023-04-19 15:48:01 | ERROR | stderr | │ │ 2023-04-19 15:48:01 | ERROR | stderr | │ 358 │ args = parser.parse_args() │ 2023-04-19 15:48:01 | ERROR | stderr | │ 359 │ logger.info(f"args: {args}") │ 2023-04-19 15:48:01 | ERROR | stderr | │ 360 │ │ 2023-04-19 15:48:01 | ERROR | stderr | │ ❱ 361 │ worker = ModelWorker(args.controller_address, │ 2023-04-19 15:48:01 | ERROR | stderr | │ 362 │ │ │ │ │ │ args.worker_address, │ 2023-04-19 15:48:01 | ERROR | stderr | │ 363 │ │ │ │ │ │ worker_id, │ 2023-04-19 15:48:01 | ERROR | stderr | │ 364 │ │ │ │ │ │ args.no_register, │ 2023-04-19 15:48:01 | ERROR | stderr | │ │ 2023-04-19 15:48:01 | ERROR | stderr | │ /root/LLaVA/llava/serve/model_worker.py:118 in init │ 2023-04-19 15:48:01 | ERROR | stderr | │ │ 2023-04-19 15:48:01 | ERROR | stderr | │ 115 │ │ logger.info(f"Loading the model {self.model_name} on worker {worker_id} ...") │ 2023-04-19 15:48:01 | ERROR | stderr | │ 116 │ │ self.is_multi_modal = is_multi_modal │ 2023-04-19 15:48:01 | ERROR | stderr | │ 117 │ │ self.keep_aspect_ratio = keep_aspect_ratio │ 2023-04-19 15:48:01 | ERROR | stderr | │ ❱ 118 │ │ self.tokenizer, self.model, self.image_processor, self.context_len = load_model( │ 2023-04-19 15:48:01 | ERROR | stderr | │ 119 │ │ │ model_path, num_gpus, is_multi_modal) │ 2023-04-19 15:48:01 | ERROR | stderr | │ 120 │ │ │ 2023-04-19 15:48:01 | ERROR | stderr | │ 121 │ │ if not no_register: │ 2023-04-19 15:48:01 | ERROR | stderr | │ │ 2023-04-19 15:48:01 | ERROR | stderr | │ /root/LLaVA/llava/serve/model_worker.py:72 in load_model │ 2023-04-19 15:48:01 | ERROR | stderr | │ │ 2023-04-19 15:48:01 | ERROR | stderr | │ 69 │ │ if mm_use_im_start_end: │ 2023-04-19 15:48:01 | ERROR | stderr | │ 70 │ │ │ tokenizer.add_tokens([DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN], special │ 2023-04-19 15:48:01 | ERROR | stderr | │ 71 │ │ │ 2023-04-19 15:48:01 | ERROR | stderr | │ ❱ 72 │ │ vision_tower = model.model.vision_tower[0] │ 2023-04-19 15:48:01 | ERROR | stderr | │ 73 │ │ if vision_tower.device.type == 'meta': │ 2023-04-19 15:48:01 | ERROR | stderr | │ 74 │ │ │ vision_tower = CLIPVisionModel.from_pretrained(vision_tower.config.name_or │ 2023-04-19 15:48:01 | ERROR | stderr | │ 75 │ │ │ model.model.vision_tower[0] = vision_tower │ 2023-04-19 15:48:01 | ERROR | stderr | │ │ 2023-04-19 15:48:01 | ERROR | stderr | │ /root/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py:1614 in getattr │ 2023-04-19 15:48:01 | ERROR | stderr | │ │ 2023-04-19 15:48:01 | ERROR | stderr | │ 1611 │ │ │ modules = self.dict['_modules'] │ 2023-04-19 15:48:01 | ERROR | stderr | │ 1612 │ │ │ if name in modules: │ 2023-04-19 15:48:01 | ERROR | stderr | │ 1613 │ │ │ │ return modules[name] │ 2023-04-19 15:48:01 | ERROR | stderr | │ ❱ 1614 │ │ raise AttributeError("'{}' object has no attribute '{}'".format( │ 2023-04-19 15:48:01 | ERROR | stderr | │ 1615 │ │ │ type(self).name, name)) │ 2023-04-19 15:48:01 | ERROR | stderr | │ 1616 │ │ 2023-04-19 15:48:01 | ERROR | stderr | │ 1617 │ def setattr(self, name: str, value: Union[Tensor, 'Module']) -> None: │ 2023-04-19 15:48:01 | ERROR | stderr | ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ 2023-04-19 15:48:01 | ERROR | stderr | AttributeError: 'LlamaModel' object has no attribute 'vision_tower'

On 19/04/2023 16:37, Haotian Liu wrote:

@Sequential-circuits https://github.com/Sequential-circuits @tekntrash https://github.com/tekntrash Can you share the full error message here? This shall be solved by installing the correct transformers package.

— Reply to this email directly, view it on GitHub https://github.com/haotian-liu/LLaVA/issues/15#issuecomment-1514952453, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYL4CNUHT6EKPCYPYTKOGVLXCABD5ANCNFSM6AAAAAAXD5ZYLE. You are receiving this because you were mentioned.Message ID: @.***>

--

Al Costa CEO 37th Floor, 1 Canada Square, Canary Wharf, London United Kingdom, E145 AA Tel: +44 1737669662 Mob: +44 7892928973 Mail: @.*** Web: http://www.tekntrash.ai Before printing this message, Be sure it is necessary.

Apr 19 '23 15:04 tekntrash

This is strange. Can you share the config.json under your converted LLaVA model folder? More specifically, do you see these lines

  "mm_hidden_size": 1024,
  "mm_use_im_start_end": true,
  "mm_vision_select_layer": -2,
  "mm_vision_tower": "openai/clip-vit-large-patch14",

Apr 19 '23 15:04 haotian-liu

There you go

Out of curiosity, what is that vision tower thing?

{ "_name_or_path": "liuhaotian/LLaVA-13b-delta-v0", "architectures": [ "LlamaForCausalLM" ], "bos_token_id": 0, "eos_token_id": 1, "hidden_act": "silu", "hidden_size": 5120, "initializer_range": 0.02, "intermediate_size": 13824, "max_sequence_length": 2048, "mm_hidden_size": 1024, "mm_use_im_start_end": true, "mm_vision_select_layer": -2, "mm_vision_tower": "openai/clip-vit-large-patch14", "model_type": "llama", "num_attention_heads": 40, "num_hidden_layers": 40, "pad_token_id": -1, "rms_norm_eps": 1e-06, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.28.0.dev0", "tune_mm_mlp_adapter": false, "use_cache": true, "use_mm_proj": true, "vocab_size": 32003 }

On 19/04/2023 16:55, Haotian Liu wrote:

This is strange. Can you share the |config.json| under your converted LLaVA model folder? More specifically, do you see these lines

|"mm_hidden_size": 1024, "mm_use_im_start_end": true, "mm_vision_select_layer": -2, "mm_vision_tower": "openai/clip-vit-large-patch14", |

— Reply to this email directly, view it on GitHub https://github.com/haotian-liu/LLaVA/issues/15#issuecomment-1514977958, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYL4CNXBORYJBPLWM63FRF3XCADHRANCNFSM6AAAAAAXD5ZYLE. You are receiving this because you were mentioned.Message ID: @.***>

--

Al Costa CEO 37th Floor, 1 Canada Square, Canary Wharf, London United Kingdom, E145 AA Tel: +44 1737669662 Mob: +44 7892928973 Mail: @.*** Web: http://www.tekntrash.ai Before printing this message, Be sure it is necessary.

Apr 19 '23 15:04 tekntrash

The vision tower is the CLIP vision encoder in our architecture to support multimodal modeling. In this first version of research preview, we modified the transformers code base so you need to install the transformers from our repo: https://github.com/haotian-liu/transformers_llava.

It is reference here, so since you have the arguments in the config, it shall load successfully if you have installed our transformers package.

Can you please try reinstalling the transformers library by running the following command:

pip install git+https://github.com/haotian-liu/transformers_llava.git@26356f0d07bacfb3857dafc7f8a519304b4c0572

Thanks!

Apr 19 '23 16:04 haotian-liu

Just ran|pip install @.*** and same error|

|Below the printscreen if it helps |

|Sorry to be a pain in the ass and probably my questions are dumb as hell, but be aware a lot of people want to jump in the chatgpt bandwagon so be prepared for a lot of noobs complaining they can't turn on the computer :D |

||

On 19/04/2023 17:03, Haotian Liu wrote:

The vision tower is the CLIP vision encoder in our architecture to support multimodal modeling. In this first version of research preview, we modified the |transformers| code base so you need to install the |transformers| from our repo: https://github.com/haotian-liu/transformers_llava.

It is reference here https://github.com/haotian-liu/transformers_llava/blob/main/src/transformers/models/llama/modeling_llama.py#L464-L466, so since you have the arguments in the config, it shall load successfully if you have installed our |transformers| package.

Can you please try reinstalling the transformers library by running the following command:

|pip install @.*** |

Thanks!

— Reply to this email directly, view it on GitHub https://github.com/haotian-liu/LLaVA/issues/15#issuecomment-1514990373, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYL4CNSHY4QB35XJTEX254LXCAEGJANCNFSM6AAAAAAXD5ZYLE. You are receiving this because you were mentioned.Message ID: @.***>

--

Al Costa CEO 37th Floor, 1 Canada Square, Canary Wharf, London United Kingdom, E145 AA Tel: +44 1737669662 Mob: +44 7892928973 Mail: @.*** Web: http://www.tekntrash.ai Before printing this message, Be sure it is necessary.

Apr 19 '23 16:04 tekntrash

Hi @tekntrash No worries at all! We want to make sure that the current implementation is easy to use -- you are helping us to make this happen!

I cannot see the screenshot though. If the screenshot does not work, can you paste the raw message here? Better wrap with ``` so that it is more well formatted and easier to read for me.

Another option that we can do to troubleshoot is to clone the transformers and reinstall:

Under LLaVA code base, run the following. And please paste the logs as well.

git clone https://github.com/haotian-liu/transformers_llava.git transformers
pip install -e ./transformers

Apr 19 '23 16:04 haotian-liu

ok now I think it is working: it ran out of memory but that I can handle :D

Below the trace for both the installation of the transformers library and running the thing

And thanks really for the help and glad to be helping you too: feel free to connect over linkedin at https://www.linkedin.com/in/alcosta01/ or to ping me in my whatsapp +44 7892928973, and if you come to London I invite you for a beer

(base) @.:~# git clone https://github.com/haotian-liu/transformers_llava.git transformers pip install -e ./transformers Cloning into 'transformers'... remote: Enumerating objects: 124725, done. remote: Counting objects: 100% (9/9), done. remote: Compressing objects: 100% (7/7), done. remote: Total 124725 (delta 1), reused 5 (delta 1), pack-reused 124716 Receiving objects: 100% (124725/124725), 125.66 MiB | 25.39 MiB/s, done. Resolving deltas: 100% (94085/94085), done. Obtaining file:///root/transformers Installing build dependencies ... done Checking if build backend supports build_editable ... done Getting requirements to build editable ... done Preparing editable metadata (pyproject.toml) ... done Requirement already satisfied: filelock in ./anaconda3/lib/python3.10/site-packages (from transformers==4.28.0.dev0) (3.9.0) Requirement already satisfied: huggingface-hub<1.0,>=0.11.0 in ./anaconda3/lib/python3.10/site-packages (from transformers==4.28.0.dev0) (0.13.3) Requirement already satisfied: numpy>=1.17 in ./anaconda3/lib/python3.10/site-packages (from transformers==4.28.0.dev0) (1.21.6) Requirement already satisfied: packaging>=20.0 in ./anaconda3/lib/python3.10/site-packages (from transformers==4.28.0.dev0) (22.0) Requirement already satisfied: pyyaml>=5.1 in ./anaconda3/lib/python3.10/site-packages (from transformers==4.28.0.dev0) (6.0) Requirement already satisfied: regex!=2019.12.17 in ./anaconda3/lib/python3.10/site-packages (from transformers==4.28.0.dev0) (2017.4.5) Requirement already satisfied: requests in ./anaconda3/lib/python3.10/site-packages (from transformers==4.28.0.dev0) (2.27.1) Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in ./anaconda3/lib/python3.10/site-packages (from transformers==4.28.0.dev0) (0.12.1) Requirement already satisfied: tqdm>=4.27 in ./anaconda3/lib/python3.10/site-packages (from transformers==4.28.0.dev0) (4.64.1) Requirement already satisfied: typing-extensions>=3.7.4.3 in ./anaconda3/lib/python3.10/site-packages (from huggingface-hub<1.0,>=0.11.0->transformers==4.28.0.dev0) (4.4.0) Requirement already satisfied: urllib3<1.27,>=1.21.1 in ./anaconda3/lib/python3.10/site-packages (from requests->transformers==4.28.0.dev0) (1.26.14) Requirement already satisfied: certifi>=2017.4.17 in ./anaconda3/lib/python3.10/site-packages (from requests->transformers==4.28.0.dev0) (2022.12.7) Requirement already satisfied: charset-normalizer~=2.0.0 in ./anaconda3/lib/python3.10/site-packages (from requests->transformers==4.28.0.dev0) (2.0.4) Requirement already satisfied: idna<4,>=2.5 in ./anaconda3/lib/python3.10/site-packages (from requests->transformers==4.28.0.dev0) (3.4) Building wheels for collected packages: transformers Building editable for transformers (pyproject.toml) ... done Created wheel for transformers: filename=transformers-4.28.0.dev0-0.editable-py3-none-any.whl size=35072 sha256=39e7c8c031e544b6d921021aa7a8966069161c58bcad2aa1aeb696c4baf058be Stored in directory: /tmp/pip-ephem-wheel-cache-ja8m1x14/wheels/cc/64/f7/a67713e0143d17a61a8c81af64dffa96f04d6602a4e4d50e71 Successfully built transformers Installing collected packages: transformers Attempting uninstall: transformers Found existing installation: transformers 4.28.0.dev0 Uninstalling transformers-4.28.0.dev0: Successfully uninstalled transformers-4.28.0.dev0 Successfully installed transformers-4.28.0.dev0 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv (base) @.:~# cd LLaVA (base) @.***:~/LLaVA# python -m llava.serve.model_worker --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path ./LLaVA-13B-v0 --multi-modal 2023-04-19 16:35:32 | INFO | model_worker | args: Namespace(host='localhost', port=40000, worker_address='http://localhost:40000', controller_address='http://localhost:10000', model_path='./LLaVA-13B-v0', model_name=None, multi_modal=True, keep_aspect_ratio=False, num_gpus=1, limit_model_concurrency=5, stream_interval=2, no_register=False) 2023-04-19 16:35:32 | INFO | model_worker | Loading the model LLaVA-13B-v0 on worker 5517a8 ... 2023-04-19 16:35:33.600890: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-04-19 16:35:34 | INFO | numexpr.utils | Note: NumExpr detected 16 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8. 2023-04-19 16:35:34 | INFO | numexpr.utils | NumExpr defaulting to 8 threads. Downloading (…)lve/main/config.json: 0%| | 0.00/4.52k [00:00<?, ?B/s] Downloading (…)lve/main/config.json: 100%|████████████████████████████████████████████████████| 4.52k/4.52k [00:00<00:00, 5.54MB/s] 2023-04-19 16:35:36 | ERROR | stderr | Downloading pytorch_model.bin: 0%| | 0.00/1.71G [00:00<?, ?B/s] Downloading pytorch_model.bin: 1%|▋ | 21.0M/1.71G [00:00<00:14, 113MB/s] Downloading pytorch_model.bin: 2%|█▍ | 41.9M/1.71G [00:00<00:14, 111MB/s] Downloading pytorch_model.bin: 4%|██▏ | 62.9M/1.71G [00:00<00:14, 111MB/s] Downloading pytorch_model.bin: 5%|██▉ | 83.9M/1.71G [00:00<00:14, 110MB/s] Downloading pytorch_model.bin: 6%|███▋ | 105M/1.71G [00:00<00:14, 110MB/s] Downloading pytorch_model.bin: 7%|████▍ | 126M/1.71G [00:01<00:14, 110MB/s] Downloading pytorch_model.bin: 9%|█████▏ | 147M/1.71G [00:01<00:14, 110MB/s] Downloading pytorch_model.bin: 10%|█████▉ | 168M/1.71G [00:01<00:14, 110MB/s] Downloading pytorch_model.bin: 11%|██████▌ | 189M/1.71G [00:01<00:13, 110MB/s] Downloading pytorch_model.bin: 12%|███████▏ | 210M/1.71G [00:01<00:15, 99.8MB/s] Downloading pytorch_model.bin: 13%|████████ | 231M/1.71G [00:02<00:14, 103MB/s] Downloading pytorch_model.bin: 15%|████████▊ | 252M/1.71G [00:02<00:13, 105MB/s] Downloading pytorch_model.bin: 16%|█████████▌ | 273M/1.71G [00:02<00:13, 107MB/s] Downloading pytorch_model.bin: 17%|██████████▎ | 294M/1.71G [00:02<00:13, 108MB/s] Downloading pytorch_model.bin: 18%|███████████ | 315M/1.71G [00:02<00:12, 109MB/s] Downloading pytorch_model.bin: 20%|███████████▊ | 336M/1.71G [00:03<00:12, 109MB/s] Downloading pytorch_model.bin: 21%|████████████▌ | 357M/1.71G [00:03<00:12, 109MB/s] Downloading pytorch_model.bin: 22%|█████████████▏ | 377M/1.71G [00:03<00:12, 109MB/s] Downloading pytorch_model.bin: 23%|█████████████▉ | 398M/1.71G [00:03<00:11, 110MB/s] Downloading pytorch_model.bin: 25%|██████████████▋ | 419M/1.71G [00:03<00:11, 110MB/s] Downloading pytorch_model.bin: 26%|███████████████▍ | 440M/1.71G [00:04<00:11, 110MB/s] Downloading pytorch_model.bin: 27%|████████████████▏ | 461M/1.71G [00:04<00:11, 110MB/s] Downloading pytorch_model.bin: 28%|████████████████▉ | 482M/1.71G [00:04<00:11, 110MB/s] Downloading pytorch_model.bin: 29%|█████████████████▋ | 503M/1.71G [00:04<00:10, 110MB/s] Downloading pytorch_model.bin: 31%|██████████████████▍ | 524M/1.71G [00:04<00:10, 111MB/s] Downloading pytorch_model.bin: 32%|███████████████████ | 545M/1.71G [00:05<00:10, 110MB/s] Downloading pytorch_model.bin: 33%|███████████████████▊ | 566M/1.71G [00:05<00:10, 110MB/s] Downloading pytorch_model.bin: 34%|████████████████████▌ | 587M/1.71G [00:05<00:10, 109MB/s] Downloading pytorch_model.bin: 36%|█████████████████████▎ | 608M/1.71G [00:05<00:10, 110MB/s] Downloading pytorch_model.bin: 37%|██████████████████████ | 629M/1.71G [00:05<00:09, 110MB/s] Downloading pytorch_model.bin: 38%|██████████████████████▊ | 650M/1.71G [00:05<00:09, 110MB/s] Downloading pytorch_model.bin: 39%|███████████████████████▌ | 671M/1.71G [00:06<00:09, 110MB/s] Downloading pytorch_model.bin: 40%|████████████████████████▎ | 692M/1.71G [00:06<00:09, 111MB/s] Downloading pytorch_model.bin: 42%|█████████████████████████ | 713M/1.71G [00:06<00:09, 111MB/s] Downloading pytorch_model.bin: 43%|█████████████████████████▋ | 734M/1.71G [00:06<00:08, 111MB/s] Downloading pytorch_model.bin: 44%|██████████████████████████▍ | 755M/1.71G [00:06<00:08, 110MB/s] Downloading pytorch_model.bin: 45%|███████████████████████████▏ | 776M/1.71G [00:07<00:08, 111MB/s] Downloading pytorch_model.bin: 47%|███████████████████████████▉ | 797M/1.71G [00:07<00:08, 111MB/s] Downloading pytorch_model.bin: 48%|████████████████████████████▋ | 818M/1.71G [00:07<00:08, 111MB/s] Downloading pytorch_model.bin: 49%|█████████████████████████████▍ | 839M/1.71G [00:07<00:07, 109MB/s] Downloading pytorch_model.bin: 50%|██████████████████████████████▏ | 860M/1.71G [00:07<00:07, 110MB/s] Downloading pytorch_model.bin: 51%|██████████████████████████████▉ | 881M/1.71G [00:08<00:07, 110MB/s] Downloading pytorch_model.bin: 53%|███████████████████████████████▋ | 902M/1.71G [00:08<00:07, 110MB/s] Downloading pytorch_model.bin: 54%|████████████████████████████████▎ | 923M/1.71G [00:08<00:07, 110MB/s] Downloading pytorch_model.bin: 55%|█████████████████████████████████ | 944M/1.71G [00:08<00:06, 110MB/s] Downloading pytorch_model.bin: 56%|█████████████████████████████████▊ | 965M/1.71G [00:08<00:06, 110MB/s] Downloading pytorch_model.bin: 58%|██████████████████████████████████▌ | 986M/1.71G [00:09<00:06, 110MB/s] Downloading pytorch_model.bin: 59%|██████████████████████████████████▋ | 1.01G/1.71G [00:09<00:06, 111MB/s] Downloading pytorch_model.bin: 60%|███████████████████████████████████▍ | 1.03G/1.71G [00:09<00:06, 110MB/s] Downloading pytorch_model.bin: 61%|████████████████████████████████████▏ | 1.05G/1.71G [00:09<00:05, 110MB/s] Downloading pytorch_model.bin: 63%|████████████████████████████████████▉ | 1.07G/1.71G [00:09<00:05, 110MB/s] Downloading pytorch_model.bin: 64%|█████████████████████████████████████▌ | 1.09G/1.71G [00:09<00:05, 110MB/s] Downloading pytorch_model.bin: 65%|██████████████████████████████████████▎ | 1.11G/1.71G [00:10<00:05, 110MB/s] Downloading pytorch_model.bin: 66%|███████████████████████████████████████ | 1.13G/1.71G [00:10<00:05, 110MB/s] Downloading pytorch_model.bin: 67%|███████████████████████████████████████▊ | 1.15G/1.71G [00:10<00:05, 110MB/s] Downloading pytorch_model.bin: 69%|████████████████████████████████████████▌ | 1.17G/1.71G [00:10<00:04, 110MB/s] Downloading pytorch_model.bin: 70%|█████████████████████████████████████████▏ | 1.20G/1.71G [00:10<00:04, 110MB/s] Downloading pytorch_model.bin: 71%|█████████████████████████████████████████▉ | 1.22G/1.71G [00:11<00:04, 110MB/s] Downloading pytorch_model.bin: 72%|██████████████████████████████████████████▋ | 1.24G/1.71G [00:11<00:04, 111MB/s] Downloading pytorch_model.bin: 74%|███████████████████████████████████████████▍ | 1.26G/1.71G [00:11<00:04, 111MB/s] Downloading pytorch_model.bin: 75%|████████████████████████████████████████████ | 1.28G/1.71G [00:11<00:03, 110MB/s] Downloading pytorch_model.bin: 76%|████████████████████████████████████████████▊ | 1.30G/1.71G [00:11<00:03, 110MB/s] Downloading pytorch_model.bin: 77%|█████████████████████████████████████████████▌ | 1.32G/1.71G [00:12<00:03, 110MB/s] Downloading pytorch_model.bin: 78%|██████████████████████████████████████████████▎ | 1.34G/1.71G [00:12<00:03, 110MB/s] Downloading pytorch_model.bin: 80%|███████████████████████████████████████████████ | 1.36G/1.71G [00:12<00:03, 110MB/s] Downloading pytorch_model.bin: 81%|███████████████████████████████████████████████▋ | 1.38G/1.71G [00:12<00:02, 111MB/s] Downloading pytorch_model.bin: 82%|████████████████████████████████████████████████▍ | 1.41G/1.71G [00:12<00:02, 110MB/s] Downloading pytorch_model.bin: 83%|█████████████████████████████████████████████████▏ | 1.43G/1.71G [00:13<00:02, 110MB/s] Downloading pytorch_model.bin: 85%|█████████████████████████████████████████████████▉ | 1.45G/1.71G [00:13<00:02, 111MB/s] Downloading pytorch_model.bin: 86%|█████████████████████████████████████████████████▊ | 1.47G/1.71G [00:13<00:02, 99.3MB/s] Downloading pytorch_model.bin: 87%|███████████████████████████████████████████████████▎ | 1.49G/1.71G [00:13<00:02, 102MB/s] Downloading pytorch_model.bin: 88%|████████████████████████████████████████████████████ | 1.51G/1.71G [00:13<00:01, 104MB/s] Downloading pytorch_model.bin: 89%|████████████████████████████████████████████████████▊ | 1.53G/1.71G [00:14<00:01, 106MB/s] Downloading pytorch_model.bin: 91%|█████████████████████████████████████████████████████▌ | 1.55G/1.71G [00:14<00:01, 107MB/s] Downloading pytorch_model.bin: 92%|██████████████████████████████████████████████████████▏ | 1.57G/1.71G [00:14<00:01, 108MB/s] Downloading pytorch_model.bin: 93%|██████████████████████████████████████████████████████▉ | 1.59G/1.71G [00:14<00:01, 109MB/s] Downloading pytorch_model.bin: 94%|███████████████████████████████████████████████████████▋ | 1.61G/1.71G [00:14<00:00, 109MB/s] Downloading pytorch_model.bin: 96%|████████████████████████████████████████████████████████▍ | 1.64G/1.71G [00:14<00:00, 110MB/s] Downloading pytorch_model.bin: 97%|█████████████████████████████████████████████████████████▏ | 1.66G/1.71G [00:15<00:00, 110MB/s] Downloading pytorch_model.bin: 98%|█████████████████████████████████████████████████████████▊ | 1.68G/1.71G [00:15<00:00, 110MB/s] Downloading pytorch_model.bin: 99%|██████████████████████████████████████████████████████████▌| 1.70G/1.71G [00:15<00:00, 110MB/s] Downloading pytorch_model.bin: 100%|███████████████████████████████████████████████████████████| 1.71G/1.71G [00:15<00:00, 109MB/s] Downloading pytorch_model.bin: 100%|███████████████████████████████████████████████████████████| 1.71G/1.71G [00:15<00:00, 109MB/s] 2023-04-19 16:35:52 | ERROR | stderr | Some weights of the model checkpoint at openai/clip-vit-large-patch14 were not used when initializing CLIPVisionModel: ['text_model.encoder.layers.0.self_attn.q_proj.weight', 'text_model.embeddings.position_ids', 'text_model.encoder.layers.3.self_attn.out_proj.weight', 'text_model.encoder.layers.2.self_attn.out_proj.bias', 'text_model.encoder.layers.11.self_attn.k_proj.weight', 'text_model.encoder.layers.8.self_attn.v_proj.bias', 'text_model.final_layer_norm.weight', 'text_model.encoder.layers.11.self_attn.out_proj.bias', 'text_model.encoder.layers.8.self_attn.q_proj.weight', 'text_model.encoder.layers.10.self_attn.k_proj.bias', 'text_model.encoder.layers.2.self_attn.q_proj.weight', 'text_model.encoder.layers.1.self_attn.out_proj.weight', 'text_model.encoder.layers.11.self_attn.k_proj.bias', 'text_model.encoder.layers.11.layer_norm2.weight', 'text_model.encoder.layers.7.self_attn.v_proj.bias', 'text_model.encoder.layers.2.self_attn.k_proj.weight', 'text_model.encoder.layers.2.mlp.fc2.weight', 'text_model.encoder.layers.2.layer_norm2.bias', 'text_model.encoder.layers.6.self_attn.v_proj.bias', 'text_model.encoder.layers.6.mlp.fc1.weight', 'text_model.encoder.layers.7.layer_norm1.bias', 'text_model.encoder.layers.7.self_attn.out_proj.bias', 'text_model.encoder.layers.0.self_attn.k_proj.weight', 'text_model.encoder.layers.0.layer_norm2.weight', 'text_model.encoder.layers.10.mlp.fc1.weight', 'text_model.encoder.layers.0.layer_norm2.bias', 'text_model.encoder.layers.6.self_attn.q_proj.weight', 'text_model.encoder.layers.2.mlp.fc2.bias', 'text_model.encoder.layers.0.mlp.fc2.bias', 'text_model.encoder.layers.3.mlp.fc2.weight', 'text_projection.weight', 'text_model.encoder.layers.11.self_attn.v_proj.weight', 'text_model.encoder.layers.1.self_attn.q_proj.bias', 'text_model.encoder.layers.9.layer_norm1.weight', 'text_model.encoder.layers.8.self_attn.out_proj.weight', 'text_model.encoder.layers.5.mlp.fc2.weight', 'text_model.encoder.layers.11.self_attn.out_proj.weight', 'text_model.encoder.layers.9.self_attn.k_proj.weight', 'text_model.encoder.layers.11.mlp.fc2.bias', 'text_model.embeddings.token_embedding.weight', 'text_model.encoder.layers.5.mlp.fc1.weight', 'logit_scale', 'text_model.encoder.layers.8.mlp.fc2.weight', 'text_model.encoder.layers.10.layer_norm2.bias', 'text_model.encoder.layers.5.self_attn.v_proj.weight', 'text_model.encoder.layers.0.mlp.fc1.bias', 'text_model.encoder.layers.4.self_attn.v_proj.bias', 'text_model.encoder.layers.0.self_attn.q_proj.bias', 'text_model.encoder.layers.4.mlp.fc2.weight', 'text_model.encoder.layers.11.layer_norm1.weight', 'text_model.encoder.layers.3.self_attn.v_proj.bias', 'text_model.encoder.layers.3.layer_norm2.weight', 'text_model.encoder.layers.7.self_attn.k_proj.bias', 'text_model.encoder.layers.10.mlp.fc2.bias', 'text_model.encoder.layers.7.layer_norm1.weight', 'text_model.encoder.layers.11.mlp.fc2.weight', 'text_model.encoder.layers.3.self_attn.out_proj.bias', 'text_model.encoder.layers.2.mlp.fc1.bias', 'text_model.embeddings.position_embedding.weight', 'text_model.encoder.layers.1.layer_norm1.weight', 'text_model.encoder.layers.7.self_attn.q_proj.bias', 'text_model.encoder.layers.6.self_attn.out_proj.weight', 'text_model.encoder.layers.3.self_attn.k_proj.bias', 'text_model.encoder.layers.5.mlp.fc2.bias', 'text_model.encoder.layers.6.self_attn.v_proj.weight', 'text_model.encoder.layers.10.self_attn.v_proj.bias', 'text_model.encoder.layers.6.self_attn.out_proj.bias', 'text_model.encoder.layers.6.mlp.fc2.bias', 'text_model.encoder.layers.3.layer_norm1.weight', 'text_model.encoder.layers.3.layer_norm2.bias', 'text_model.encoder.layers.8.mlp.fc1.weight', 'text_model.encoder.layers.4.self_attn.v_proj.weight', 'text_model.encoder.layers.4.layer_norm1.weight', 'text_model.encoder.layers.7.mlp.fc2.bias', 'text_model.encoder.layers.6.mlp.fc2.weight', 'text_model.encoder.layers.9.self_attn.v_proj.bias', 'text_model.encoder.layers.10.mlp.fc2.weight', 'text_model.encoder.layers.10.layer_norm1.bias', 'text_model.final_layer_norm.bias', 'text_model.encoder.layers.9.self_attn.out_proj.bias', 'text_model.encoder.layers.3.self_attn.v_proj.weight', 'text_model.encoder.layers.10.layer_norm2.weight', 'text_model.encoder.layers.1.layer_norm1.bias', 'text_model.encoder.layers.8.layer_norm1.bias', 'text_model.encoder.layers.10.self_attn.q_proj.bias', 'text_model.encoder.layers.4.self_attn.out_proj.weight', 'text_model.encoder.layers.6.layer_norm2.weight', 'text_model.encoder.layers.3.self_attn.q_proj.weight', 'text_model.encoder.layers.7.mlp.fc2.weight', 'text_model.encoder.layers.4.self_attn.k_proj.bias', 'text_model.encoder.layers.2.self_attn.v_proj.weight', 'text_model.encoder.layers.4.layer_norm2.weight', 'text_model.encoder.layers.7.self_attn.q_proj.weight', 'text_model.encoder.layers.9.layer_norm2.bias', 'text_model.encoder.layers.2.self_attn.out_proj.weight', 'text_model.encoder.layers.6.self_attn.q_proj.bias', 'text_model.encoder.layers.8.self_attn.out_proj.bias', 'text_model.encoder.layers.6.layer_norm2.bias', 'text_model.encoder.layers.7.layer_norm2.bias', 'text_model.encoder.layers.11.layer_norm1.bias', 'text_model.encoder.layers.0.self_attn.k_proj.bias', 'text_model.encoder.layers.5.self_attn.k_proj.bias', 'text_model.encoder.layers.10.mlp.fc1.bias', 'text_model.encoder.layers.9.layer_norm1.bias', 'text_model.encoder.layers.9.self_attn.out_proj.weight', 'text_model.encoder.layers.3.layer_norm1.bias', 'text_model.encoder.layers.4.layer_norm1.bias', 'text_model.encoder.layers.6.self_attn.k_proj.bias', 'text_model.encoder.layers.9.mlp.fc1.bias', 'text_model.encoder.layers.3.mlp.fc1.weight', 'text_model.encoder.layers.11.self_attn.q_proj.weight', 'text_model.encoder.layers.0.layer_norm1.weight', 'text_model.encoder.layers.7.mlp.fc1.weight', 'text_model.encoder.layers.9.self_attn.q_proj.bias', 'text_model.encoder.layers.2.self_attn.k_proj.bias', 'text_model.encoder.layers.4.mlp.fc1.bias', 'text_model.encoder.layers.11.mlp.fc1.bias', 'text_model.encoder.layers.10.self_attn.v_proj.weight', 'text_model.encoder.layers.5.layer_norm1.weight', 'text_model.encoder.layers.1.mlp.fc2.bias', 'visual_projection.weight', 'text_model.encoder.layers.10.self_attn.out_proj.weight', 'text_model.encoder.layers.2.self_attn.q_proj.bias', 'text_model.encoder.layers.8.self_attn.k_proj.weight', 'text_model.encoder.layers.8.mlp.fc2.bias', 'text_model.encoder.layers.2.self_attn.v_proj.bias', 'text_model.encoder.layers.4.self_attn.k_proj.weight', 'text_model.encoder.layers.11.self_attn.v_proj.bias', 'text_model.encoder.layers.11.mlp.fc1.weight', 'text_model.encoder.layers.0.self_attn.v_proj.weight', 'text_model.encoder.layers.5.self_attn.out_proj.bias', 'text_model.encoder.layers.10.self_attn.q_proj.weight', 'text_model.encoder.layers.10.layer_norm1.weight', 'text_model.encoder.layers.8.layer_norm1.weight', 'text_model.encoder.layers.4.mlp.fc1.weight', 'text_model.encoder.layers.5.mlp.fc1.bias', 'text_model.encoder.layers.2.layer_norm1.bias', 'text_model.encoder.layers.5.layer_norm1.bias', 'text_model.encoder.layers.0.self_attn.out_proj.bias', 'text_model.encoder.layers.3.self_attn.k_proj.weight', 'text_model.encoder.layers.9.mlp.fc2.weight', 'text_model.encoder.layers.6.layer_norm1.weight', 'text_model.encoder.layers.8.layer_norm2.weight', 'text_model.encoder.layers.1.mlp.fc1.weight', 'text_model.encoder.layers.1.mlp.fc2.weight', 'text_model.encoder.layers.2.layer_norm2.weight', 'text_model.encoder.layers.7.self_attn.k_proj.weight', 'text_model.encoder.layers.9.self_attn.k_proj.bias', 'text_model.encoder.layers.3.mlp.fc1.bias', 'text_model.encoder.layers.7.self_attn.out_proj.weight', 'text_model.encoder.layers.9.layer_norm2.weight', 'text_model.encoder.layers.6.layer_norm1.bias', 'text_model.encoder.layers.1.self_attn.q_proj.weight', 'text_model.encoder.layers.5.layer_norm2.weight', 'text_model.encoder.layers.9.mlp.fc1.weight', 'text_model.encoder.layers.0.self_attn.v_proj.bias', 'text_model.encoder.layers.8.self_attn.v_proj.weight', 'text_model.encoder.layers.1.self_attn.v_proj.weight', 'text_model.encoder.layers.5.self_attn.k_proj.weight', 'text_model.encoder.layers.0.mlp.fc2.weight', 'text_model.encoder.layers.5.layer_norm2.bias', 'text_model.encoder.layers.1.self_attn.out_proj.bias', 'text_model.encoder.layers.0.layer_norm1.bias', 'text_model.encoder.layers.8.self_attn.k_proj.bias', 'text_model.encoder.layers.8.self_attn.q_proj.bias', 'text_model.encoder.layers.8.mlp.fc1.bias', 'text_model.encoder.layers.4.self_attn.out_proj.bias', 'text_model.encoder.layers.8.layer_norm2.bias', 'text_model.encoder.layers.9.self_attn.v_proj.weight', 'text_model.encoder.layers.9.self_attn.q_proj.weight', 'text_model.encoder.layers.0.mlp.fc1.weight', 'text_model.encoder.layers.1.self_attn.v_proj.bias', 'text_model.encoder.layers.7.self_attn.v_proj.weight', 'text_model.encoder.layers.0.self_attn.out_proj.weight', 'text_model.encoder.layers.1.layer_norm2.bias', 'text_model.encoder.layers.5.self_attn.q_proj.bias', 'text_model.encoder.layers.5.self_attn.v_proj.bias', 'text_model.encoder.layers.11.self_attn.q_proj.bias', 'text_model.encoder.layers.1.self_attn.k_proj.bias', 'text_model.encoder.layers.1.mlp.fc1.bias', 'text_model.encoder.layers.1.self_attn.k_proj.weight', 'text_model.encoder.layers.7.layer_norm2.weight', 'text_model.encoder.layers.2.mlp.fc1.weight', 'text_model.encoder.layers.4.self_attn.q_proj.weight', 'text_model.encoder.layers.3.self_attn.q_proj.bias', 'text_model.encoder.layers.10.self_attn.out_proj.bias', 'text_model.encoder.layers.4.self_attn.q_proj.bias', 'text_model.encoder.layers.4.mlp.fc2.bias', 'text_model.encoder.layers.6.self_attn.k_proj.weight', 'text_model.encoder.layers.1.layer_norm2.weight', 'text_model.encoder.layers.10.self_attn.k_proj.weight', 'text_model.encoder.layers.5.self_attn.q_proj.weight', 'text_model.encoder.layers.9.mlp.fc2.bias', 'text_model.encoder.layers.2.layer_norm1.weight', 'text_model.encoder.layers.7.mlp.fc1.bias', 'text_model.encoder.layers.6.mlp.fc1.bias', 'text_model.encoder.layers.11.layer_norm2.bias', 'text_model.encoder.layers.4.layer_norm2.bias', 'text_model.encoder.layers.5.self_attn.out_proj.weight', 'text_model.encoder.layers.3.mlp.fc2.bias']

This IS expected if you are initializing CLIPVisionModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing CLIPVisionModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s] Loading checkpoint shards: 33%|███████████████████████ | 1/3 [00:04<00:09, 4.83s/it] Loading checkpoint shards: 67%|██████████████████████████████████████████████ | 2/3 [00:09<00:04, 4.79s/it] Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████| 3/3 [00:12<00:00, 3.99s/it] Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████| 3/3 [00:12<00:00, 4.21s/it] 2023-04-19 16:36:06 | ERROR | stderr | Some weights of LlamaForCausalLM were not initialized from the model checkpoint at ./LLaVA-13B-v0 and are newly initialized: ['model.mm_projector.weight', 'model.mm_projector.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Some weights of the model checkpoint at openai/clip-vit-large-patch14 were not used when initializing CLIPVisionModel: ['text_model.encoder.layers.0.self_attn.q_proj.weight', 'text_model.embeddings.position_ids', 'text_model.encoder.layers.3.self_attn.out_proj.weight', 'text_model.encoder.layers.2.self_attn.out_proj.bias', 'text_model.encoder.layers.11.self_attn.k_proj.weight', 'text_model.encoder.layers.8.self_attn.v_proj.bias', 'text_model.final_layer_norm.weight', 'text_model.encoder.layers.11.self_attn.out_proj.bias', 'text_model.encoder.layers.8.self_attn.q_proj.weight', 'text_model.encoder.layers.10.self_attn.k_proj.bias', 'text_model.encoder.layers.2.self_attn.q_proj.weight', 'text_model.encoder.layers.1.self_attn.out_proj.weight', 'text_model.encoder.layers.11.self_attn.k_proj.bias', 'text_model.encoder.layers.11.layer_norm2.weight', 'text_model.encoder.layers.7.self_attn.v_proj.bias', 'text_model.encoder.layers.2.self_attn.k_proj.weight', 'text_model.encoder.layers.2.mlp.fc2.weight', 'text_model.encoder.layers.2.layer_norm2.bias', 'text_model.encoder.layers.6.self_attn.v_proj.bias', 'text_model.encoder.layers.6.mlp.fc1.weight', 'text_model.encoder.layers.7.layer_norm1.bias', 'text_model.encoder.layers.7.self_attn.out_proj.bias', 'text_model.encoder.layers.0.self_attn.k_proj.weight', 'text_model.encoder.layers.0.layer_norm2.weight', 'text_model.encoder.layers.10.mlp.fc1.weight', 'text_model.encoder.layers.0.layer_norm2.bias', 'text_model.encoder.layers.6.self_attn.q_proj.weight', 'text_model.encoder.layers.2.mlp.fc2.bias', 'text_model.encoder.layers.0.mlp.fc2.bias', 'text_model.encoder.layers.3.mlp.fc2.weight', 'text_projection.weight', 'text_model.encoder.layers.11.self_attn.v_proj.weight', 'text_model.encoder.layers.1.self_attn.q_proj.bias', 'text_model.encoder.layers.9.layer_norm1.weight', 'text_model.encoder.layers.8.self_attn.out_proj.weight', 'text_model.encoder.layers.5.mlp.fc2.weight', 'text_model.encoder.layers.11.self_attn.out_proj.weight', 'text_model.encoder.layers.9.self_attn.k_proj.weight', 'text_model.encoder.layers.11.mlp.fc2.bias', 'text_model.embeddings.token_embedding.weight', 'text_model.encoder.layers.5.mlp.fc1.weight', 'logit_scale', 'text_model.encoder.layers.8.mlp.fc2.weight', 'text_model.encoder.layers.10.layer_norm2.bias', 'text_model.encoder.layers.5.self_attn.v_proj.weight', 'text_model.encoder.layers.0.mlp.fc1.bias', 'text_model.encoder.layers.4.self_attn.v_proj.bias', 'text_model.encoder.layers.0.self_attn.q_proj.bias', 'text_model.encoder.layers.4.mlp.fc2.weight', 'text_model.encoder.layers.11.layer_norm1.weight', 'text_model.encoder.layers.3.self_attn.v_proj.bias', 'text_model.encoder.layers.3.layer_norm2.weight', 'text_model.encoder.layers.7.self_attn.k_proj.bias', 'text_model.encoder.layers.10.mlp.fc2.bias', 'text_model.encoder.layers.7.layer_norm1.weight', 'text_model.encoder.layers.11.mlp.fc2.weight', 'text_model.encoder.layers.3.self_attn.out_proj.bias', 'text_model.encoder.layers.2.mlp.fc1.bias', 'text_model.embeddings.position_embedding.weight', 'text_model.encoder.layers.1.layer_norm1.weight', 'text_model.encoder.layers.7.self_attn.q_proj.bias', 'text_model.encoder.layers.6.self_attn.out_proj.weight', 'text_model.encoder.layers.3.self_attn.k_proj.bias', 'text_model.encoder.layers.5.mlp.fc2.bias', 'text_model.encoder.layers.6.self_attn.v_proj.weight', 'text_model.encoder.layers.10.self_attn.v_proj.bias', 'text_model.encoder.layers.6.self_attn.out_proj.bias', 'text_model.encoder.layers.6.mlp.fc2.bias', 'text_model.encoder.layers.3.layer_norm1.weight', 'text_model.encoder.layers.3.layer_norm2.bias', 'text_model.encoder.layers.8.mlp.fc1.weight', 'text_model.encoder.layers.4.self_attn.v_proj.weight', 'text_model.encoder.layers.4.layer_norm1.weight', 'text_model.encoder.layers.7.mlp.fc2.bias', 'text_model.encoder.layers.6.mlp.fc2.weight', 'text_model.encoder.layers.9.self_attn.v_proj.bias', 'text_model.encoder.layers.10.mlp.fc2.weight', 'text_model.encoder.layers.10.layer_norm1.bias', 'text_model.final_layer_norm.bias', 'text_model.encoder.layers.9.self_attn.out_proj.bias', 'text_model.encoder.layers.3.self_attn.v_proj.weight', 'text_model.encoder.layers.10.layer_norm2.weight', 'text_model.encoder.layers.1.layer_norm1.bias', 'text_model.encoder.layers.8.layer_norm1.bias', 'text_model.encoder.layers.10.self_attn.q_proj.bias', 'text_model.encoder.layers.4.self_attn.out_proj.weight', 'text_model.encoder.layers.6.layer_norm2.weight', 'text_model.encoder.layers.3.self_attn.q_proj.weight', 'text_model.encoder.layers.7.mlp.fc2.weight', 'text_model.encoder.layers.4.self_attn.k_proj.bias', 'text_model.encoder.layers.2.self_attn.v_proj.weight', 'text_model.encoder.layers.4.layer_norm2.weight', 'text_model.encoder.layers.7.self_attn.q_proj.weight', 'text_model.encoder.layers.9.layer_norm2.bias', 'text_model.encoder.layers.2.self_attn.out_proj.weight', 'text_model.encoder.layers.6.self_attn.q_proj.bias', 'text_model.encoder.layers.8.self_attn.out_proj.bias', 'text_model.encoder.layers.6.layer_norm2.bias', 'text_model.encoder.layers.7.layer_norm2.bias', 'text_model.encoder.layers.11.layer_norm1.bias', 'text_model.encoder.layers.0.self_attn.k_proj.bias', 'text_model.encoder.layers.5.self_attn.k_proj.bias', 'text_model.encoder.layers.10.mlp.fc1.bias', 'text_model.encoder.layers.9.layer_norm1.bias', 'text_model.encoder.layers.9.self_attn.out_proj.weight', 'text_model.encoder.layers.3.layer_norm1.bias', 'text_model.encoder.layers.4.layer_norm1.bias', 'text_model.encoder.layers.6.self_attn.k_proj.bias', 'text_model.encoder.layers.9.mlp.fc1.bias', 'text_model.encoder.layers.3.mlp.fc1.weight', 'text_model.encoder.layers.11.self_attn.q_proj.weight', 'text_model.encoder.layers.0.layer_norm1.weight', 'text_model.encoder.layers.7.mlp.fc1.weight', 'text_model.encoder.layers.9.self_attn.q_proj.bias', 'text_model.encoder.layers.2.self_attn.k_proj.bias', 'text_model.encoder.layers.4.mlp.fc1.bias', 'text_model.encoder.layers.11.mlp.fc1.bias', 'text_model.encoder.layers.10.self_attn.v_proj.weight', 'text_model.encoder.layers.5.layer_norm1.weight', 'text_model.encoder.layers.1.mlp.fc2.bias', 'visual_projection.weight', 'text_model.encoder.layers.10.self_attn.out_proj.weight', 'text_model.encoder.layers.2.self_attn.q_proj.bias', 'text_model.encoder.layers.8.self_attn.k_proj.weight', 'text_model.encoder.layers.8.mlp.fc2.bias', 'text_model.encoder.layers.2.self_attn.v_proj.bias', 'text_model.encoder.layers.4.self_attn.k_proj.weight', 'text_model.encoder.layers.11.self_attn.v_proj.bias', 'text_model.encoder.layers.11.mlp.fc1.weight', 'text_model.encoder.layers.0.self_attn.v_proj.weight', 'text_model.encoder.layers.5.self_attn.out_proj.bias', 'text_model.encoder.layers.10.self_attn.q_proj.weight', 'text_model.encoder.layers.10.layer_norm1.weight', 'text_model.encoder.layers.8.layer_norm1.weight', 'text_model.encoder.layers.4.mlp.fc1.weight', 'text_model.encoder.layers.5.mlp.fc1.bias', 'text_model.encoder.layers.2.layer_norm1.bias', 'text_model.encoder.layers.5.layer_norm1.bias', 'text_model.encoder.layers.0.self_attn.out_proj.bias', 'text_model.encoder.layers.3.self_attn.k_proj.weight', 'text_model.encoder.layers.9.mlp.fc2.weight', 'text_model.encoder.layers.6.layer_norm1.weight', 'text_model.encoder.layers.8.layer_norm2.weight', 'text_model.encoder.layers.1.mlp.fc1.weight', 'text_model.encoder.layers.1.mlp.fc2.weight', 'text_model.encoder.layers.2.layer_norm2.weight', 'text_model.encoder.layers.7.self_attn.k_proj.weight', 'text_model.encoder.layers.9.self_attn.k_proj.bias', 'text_model.encoder.layers.3.mlp.fc1.bias', 'text_model.encoder.layers.7.self_attn.out_proj.weight', 'text_model.encoder.layers.9.layer_norm2.weight', 'text_model.encoder.layers.6.layer_norm1.bias', 'text_model.encoder.layers.1.self_attn.q_proj.weight', 'text_model.encoder.layers.5.layer_norm2.weight', 'text_model.encoder.layers.9.mlp.fc1.weight', 'text_model.encoder.layers.0.self_attn.v_proj.bias', 'text_model.encoder.layers.8.self_attn.v_proj.weight', 'text_model.encoder.layers.1.self_attn.v_proj.weight', 'text_model.encoder.layers.5.self_attn.k_proj.weight', 'text_model.encoder.layers.0.mlp.fc2.weight', 'text_model.encoder.layers.5.layer_norm2.bias', 'text_model.encoder.layers.1.self_attn.out_proj.bias', 'text_model.encoder.layers.0.layer_norm1.bias', 'text_model.encoder.layers.8.self_attn.k_proj.bias', 'text_model.encoder.layers.8.self_attn.q_proj.bias', 'text_model.encoder.layers.8.mlp.fc1.bias', 'text_model.encoder.layers.4.self_attn.out_proj.bias', 'text_model.encoder.layers.8.layer_norm2.bias', 'text_model.encoder.layers.9.self_attn.v_proj.weight', 'text_model.encoder.layers.9.self_attn.q_proj.weight', 'text_model.encoder.layers.0.mlp.fc1.weight', 'text_model.encoder.layers.1.self_attn.v_proj.bias', 'text_model.encoder.layers.7.self_attn.v_proj.weight', 'text_model.encoder.layers.0.self_attn.out_proj.weight', 'text_model.encoder.layers.1.layer_norm2.bias', 'text_model.encoder.layers.5.self_attn.q_proj.bias', 'text_model.encoder.layers.5.self_attn.v_proj.bias', 'text_model.encoder.layers.11.self_attn.q_proj.bias', 'text_model.encoder.layers.1.self_attn.k_proj.bias', 'text_model.encoder.layers.1.mlp.fc1.bias', 'text_model.encoder.layers.1.self_attn.k_proj.weight', 'text_model.encoder.layers.7.layer_norm2.weight', 'text_model.encoder.layers.2.mlp.fc1.weight', 'text_model.encoder.layers.4.self_attn.q_proj.weight', 'text_model.encoder.layers.3.self_attn.q_proj.bias', 'text_model.encoder.layers.10.self_attn.out_proj.bias', 'text_model.encoder.layers.4.self_attn.q_proj.bias', 'text_model.encoder.layers.4.mlp.fc2.bias', 'text_model.encoder.layers.6.self_attn.k_proj.weight', 'text_model.encoder.layers.1.layer_norm2.weight', 'text_model.encoder.layers.10.self_attn.k_proj.weight', 'text_model.encoder.layers.5.self_attn.q_proj.weight', 'text_model.encoder.layers.9.mlp.fc2.bias', 'text_model.encoder.layers.2.layer_norm1.weight', 'text_model.encoder.layers.7.mlp.fc1.bias', 'text_model.encoder.layers.6.mlp.fc1.bias', 'text_model.encoder.layers.11.layer_norm2.bias', 'text_model.encoder.layers.4.layer_norm2.bias', 'text_model.encoder.layers.5.self_attn.out_proj.weight', 'text_model.encoder.layers.3.mlp.fc2.bias']
This IS expected if you are initializing CLIPVisionModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing CLIPVisionModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). 2023-04-19 16:36:14 | ERROR | stderr | ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ 2023-04-19 16:36:14 | ERROR | stderr | │ /root/anaconda3/lib/python3.10/runpy.py:196 in _run_module_as_main │ 2023-04-19 16:36:14 | ERROR | stderr | │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ 193 │ main_globals = sys.modules["main"].dict │ 2023-04-19 16:36:14 | ERROR | stderr | │ 194 │ if alter_argv: │ 2023-04-19 16:36:14 | ERROR | stderr | │ 195 │ │ sys.argv[0] = mod_spec.origin │ 2023-04-19 16:36:14 | ERROR | stderr | │ ❱ 196 │ return _run_code(code, main_globals, None, │ 2023-04-19 16:36:14 | ERROR | stderr | │ 197 │ │ │ │ │ "main", mod_spec) │ 2023-04-19 16:36:14 | ERROR | stderr | │ 198 │ 2023-04-19 16:36:14 | ERROR | stderr | │ 199 def run_module(mod_name, init_globals=None, │ 2023-04-19 16:36:14 | ERROR | stderr | │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ /root/anaconda3/lib/python3.10/runpy.py:86 in _run_code │ 2023-04-19 16:36:14 | ERROR | stderr | │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ 83 │ │ │ │ │ loader = loader, │ 2023-04-19 16:36:14 | ERROR | stderr | │ 84 │ │ │ │ │ package = pkg_name, │ 2023-04-19 16:36:14 | ERROR | stderr | │ 85 │ │ │ │ │ spec = mod_spec) │ 2023-04-19 16:36:14 | ERROR | stderr | │ ❱ 86 │ exec(code, run_globals) │ 2023-04-19 16:36:14 | ERROR | stderr | │ 87 │ return run_globals │ 2023-04-19 16:36:14 | ERROR | stderr | │ 88 │ 2023-04-19 16:36:14 | ERROR | stderr | │ 89 def _run_module_code(code, init_globals=None, │ 2023-04-19 16:36:14 | ERROR | stderr | │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ /root/LLaVA/llava/serve/model_worker.py:361 in │ 2023-04-19 16:36:14 | ERROR | stderr | │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ 358 │ args = parser.parse_args() │ 2023-04-19 16:36:14 | ERROR | stderr | │ 359 │ logger.info(f"args: {args}") │ 2023-04-19 16:36:14 | ERROR | stderr | │ 360 │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ ❱ 361 │ worker = ModelWorker(args.controller_address, │ 2023-04-19 16:36:14 | ERROR | stderr | │ 362 │ │ │ │ │ │ args.worker_address, │ 2023-04-19 16:36:14 | ERROR | stderr | │ 363 │ │ │ │ │ │ worker_id, │ 2023-04-19 16:36:14 | ERROR | stderr | │ 364 │ │ │ │ │ │ args.no_register, │ 2023-04-19 16:36:14 | ERROR | stderr | │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ /root/LLaVA/llava/serve/model_worker.py:118 in init │ 2023-04-19 16:36:14 | ERROR | stderr | │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ 115 │ │ logger.info(f"Loading the model {self.model_name} on worker {worker_id} ...") │ 2023-04-19 16:36:14 | ERROR | stderr | │ 116 │ │ self.is_multi_modal = is_multi_modal │ 2023-04-19 16:36:14 | ERROR | stderr | │ 117 │ │ self.keep_aspect_ratio = keep_aspect_ratio │ 2023-04-19 16:36:14 | ERROR | stderr | │ ❱ 118 │ │ self.tokenizer, self.model, self.image_processor, self.context_len = load_model( │ 2023-04-19 16:36:14 | ERROR | stderr | │ 119 │ │ │ model_path, num_gpus, is_multi_modal) │ 2023-04-19 16:36:14 | ERROR | stderr | │ 120 │ │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ 121 │ │ if not no_register: │ 2023-04-19 16:36:14 | ERROR | stderr | │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ /root/LLaVA/llava/serve/model_worker.py:85 in load_model │ 2023-04-19 16:36:14 | ERROR | stderr | │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ 82 │ │ │ vision_config.im_start_token, vision_config.im_end_token = tokenizer.convert │ 2023-04-19 16:36:14 | ERROR | stderr | │ 83 │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ 84 │ if num_gpus == 1: │ 2023-04-19 16:36:14 | ERROR | stderr | │ ❱ 85 │ │ model.cuda() │ 2023-04-19 16:36:14 | ERROR | stderr | │ 86 │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ 87 │ if hasattr(model.config, "max_sequence_length"): │ 2023-04-19 16:36:14 | ERROR | stderr | │ 88 │ │ context_len = model.config.max_sequence_length │ 2023-04-19 16:36:14 | ERROR | stderr | │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ /root/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py:905 in cuda │ 2023-04-19 16:36:14 | ERROR | stderr | │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ 902 │ │ Returns: │ 2023-04-19 16:36:14 | ERROR | stderr | │ 903 │ │ │ Module: self │ 2023-04-19 16:36:14 | ERROR | stderr | │ 904 │ │ """ │ 2023-04-19 16:36:14 | ERROR | stderr | │ ❱ 905 │ │ return self._apply(lambda t: t.cuda(device)) │ 2023-04-19 16:36:14 | ERROR | stderr | │ 906 │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ 907 │ def ipu(self: T, device: Optional[Union[int, device]] = None) -> T: │ 2023-04-19 16:36:14 | ERROR | stderr | │ 908 │ │ r"""Moves all model parameters and buffers to the IPU. │ 2023-04-19 16:36:14 | ERROR | stderr | │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ /root/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py:797 in _apply │ 2023-04-19 16:36:14 | ERROR | stderr | │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ 794 │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ 795 │ def _apply(self, fn): │ 2023-04-19 16:36:14 | ERROR | stderr | │ 796 │ │ for module in self.children(): │ 2023-04-19 16:36:14 | ERROR | stderr | │ ❱ 797 │ │ │ module._apply(fn) │ 2023-04-19 16:36:14 | ERROR | stderr | │ 798 │ │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ 799 │ │ def compute_should_use_set_data(tensor, tensor_applied): │ 2023-04-19 16:36:14 | ERROR | stderr | │ 800 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │ 2023-04-19 16:36:14 | ERROR | stderr | │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ /root/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py:797 in _apply │ 2023-04-19 16:36:14 | ERROR | stderr | │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ 794 │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ 795 │ def _apply(self, fn): │ 2023-04-19 16:36:14 | ERROR | stderr | │ 796 │ │ for module in self.children(): │ 2023-04-19 16:36:14 | ERROR | stderr | │ ❱ 797 │ │ │ module._apply(fn) │ 2023-04-19 16:36:14 | ERROR | stderr | │ 798 │ │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ 799 │ │ def compute_should_use_set_data(tensor, tensor_applied): │ 2023-04-19 16:36:14 | ERROR | stderr | │ 800 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │ 2023-04-19 16:36:14 | ERROR | stderr | │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ /root/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py:797 in _apply │ 2023-04-19 16:36:14 | ERROR | stderr | │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ 794 │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ 795 │ def _apply(self, fn): │ 2023-04-19 16:36:14 | ERROR | stderr | │ 796 │ │ for module in self.children(): │ 2023-04-19 16:36:14 | ERROR | stderr | │ ❱ 797 │ │ │ module._apply(fn) │ 2023-04-19 16:36:14 | ERROR | stderr | │ 798 │ │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ 799 │ │ def compute_should_use_set_data(tensor, tensor_applied): │ 2023-04-19 16:36:14 | ERROR | stderr | │ 800 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │ 2023-04-19 16:36:14 | ERROR | stderr | │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ /root/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py:797 in _apply │ 2023-04-19 16:36:14 | ERROR | stderr | │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ 794 │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ 795 │ def _apply(self, fn): │ 2023-04-19 16:36:14 | ERROR | stderr | │ 796 │ │ for module in self.children(): │ 2023-04-19 16:36:14 | ERROR | stderr | │ ❱ 797 │ │ │ module._apply(fn) │ 2023-04-19 16:36:14 | ERROR | stderr | │ 798 │ │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ 799 │ │ def compute_should_use_set_data(tensor, tensor_applied): │ 2023-04-19 16:36:14 | ERROR | stderr | │ 800 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │ 2023-04-19 16:36:14 | ERROR | stderr | │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ /root/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py:797 in _apply │ 2023-04-19 16:36:14 | ERROR | stderr | │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ 794 │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ 795 │ def _apply(self, fn): │ 2023-04-19 16:36:14 | ERROR | stderr | │ 796 │ │ for module in self.children(): │ 2023-04-19 16:36:14 | ERROR | stderr | │ ❱ 797 │ │ │ module._apply(fn) │ 2023-04-19 16:36:14 | ERROR | stderr | │ 798 │ │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ 799 │ │ def compute_should_use_set_data(tensor, tensor_applied): │ 2023-04-19 16:36:14 | ERROR | stderr | │ 800 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │ 2023-04-19 16:36:14 | ERROR | stderr | │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ /root/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py:820 in _apply │ 2023-04-19 16:36:14 | ERROR | stderr | │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ 817 │ │ │ # track autograd history of param_applied, so we have to use │ 2023-04-19 16:36:14 | ERROR | stderr | │ 818 │ │ │ # with torch.no_grad(): │ 2023-04-19 16:36:14 | ERROR | stderr | │ 819 │ │ │ with torch.no_grad(): │ 2023-04-19 16:36:14 | ERROR | stderr | │ ❱ 820 │ │ │ │ param_applied = fn(param) │ 2023-04-19 16:36:14 | ERROR | stderr | │ 821 │ │ │ should_use_set_data = compute_should_use_set_data(param, param_applied) │ 2023-04-19 16:36:14 | ERROR | stderr | │ 822 │ │ │ if should_use_set_data: │ 2023-04-19 16:36:14 | ERROR | stderr | │ 823 │ │ │ │ param.data = param_applied │ 2023-04-19 16:36:14 | ERROR | stderr | │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ /root/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py:905 in │ 2023-04-19 16:36:14 | ERROR | stderr | │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ 902 │ │ Returns: │ 2023-04-19 16:36:14 | ERROR | stderr | │ 903 │ │ │ Module: self │ 2023-04-19 16:36:14 | ERROR | stderr | │ 904 │ │ """ │ 2023-04-19 16:36:14 | ERROR | stderr | │ ❱ 905 │ │ return self._apply(lambda t: t.cuda(device)) │ 2023-04-19 16:36:14 | ERROR | stderr | │ 906 │ │ 2023-04-19 16:36:14 | ERROR | stderr | │ 907 │ def ipu(self: T, device: Optional[Union[int, device]] = None) -> T: │ 2023-04-19 16:36:14 | ERROR | stderr | │ 908 │ │ r"""Moves all model parameters and buffers to the IPU. │ 2023-04-19 16:36:14 | ERROR | stderr | ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ 2023-04-19 16:36:14 | ERROR | stderr | OutOfMemoryError: CUDA out of memory. Tried to allocate 136.00 MiB (GPU 0; 23.87 GiB total capacity; 23.71 GiB already allocated; 2023-04-19 16:36:14 | ERROR | stderr | 19.62 MiB free; 23.71 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to 2023-04-19 16:36:14 | ERROR | stderr | avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

On 19/04/2023 17:30, Haotian Liu wrote:

|git clone https://github.com/haotian-liu/transformers_llava.git transformers pip install -e ./transformers| --

Al Costa CEO 37th Floor, 1 Canada Square, Canary Wharf, London United Kingdom, E145 AA Tel: +44 1737669662 Mob: +44 7892928973 Mail: @.*** Web: http://www.tekntrash.ai Before printing this message, Be sure it is necessary.

Apr 19 '23 16:04 tekntrash

whats the minimum vram needed and would bitsandbytes help lower the vram requirement?

Apr 19 '23 18:04 kagevazquez

@tekntrash @kagevazquez

I updated the code base today, to support inferencing on 2x 3090s. It needs around 28GB memory currently for inference.

https://github.com/haotian-liu/LLaVA#launch-a-model-worker-multiple-gpus-when-gpu-vram--24gb

bitsandbytes will definitely help, and we'll update instructions on that as well this week.

Apr 21 '23 00:04 haotian-liu

bitsandbytes sounds great. I haven't actually tried LLaVA inference yet, but I see no reason why it shouldn't be hackable to work on just one 3090 (which I accomplished with MiniGPT-4 previously).

Apr 21 '23 02:04 152334H

This is strange. Can you share the config.json under your converted LLaVA model folder? More specifically, do you see these lines
  "mm_hidden_size": 1024,
  "mm_use_im_start_end": true,
  "mm_vision_select_layer": -2,
  "mm_vision_tower": "openai/clip-vit-large-patch14",

I have updated the codebase to v0.1 but has the same error when Launch a model worker the error says " AttributeError: 'LlamaModel' object has no attribute 'vision_tower"

May 04 '23 05:05 microhu

Hi @microhu, can you share the command you use and the full error log? This can help me better figure out what the issue you are facing. Thanks.

May 04 '23 05:05 haotian-liu

Hi @microhu, can you share the command you use and the full error log? This can help me better figure out what the issue you are facing. Thanks.

it happens when i Launch a model worker

python -m llava.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path /mnt/dolphinfs/hdd_pool/docker/user/hadoop-mt-ocr/huwenping/HuggingFace_models/LLaVA-13b-v0 --multi-modal 2023-05-04 23:09:27 | INFO | model_worker | args: Namespace(host='0.0.0.0', port=40000, worker_address='http://localhost:40000', controller_address='http://localhost:10000', model_path='/mnt/dolphinfs/hdd_pool/docker/user/hadoop-mt-ocr/huwenping/HuggingFace_models/LLaVA-13b-v0', model_name=None, multi_modal=True, keep_aspect_ratio=False, num_gpus=1, limit_model_concurrency=5, stream_interval=2, no_register=False) 2023-05-04 23:09:27 | INFO | model_worker | Loading the model LLaVA-13b-v0 on worker c64123 ... Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s] Loading checkpoint shards: 33%|█████████████████████▎ | 1/3 [00:29<00:59, 29.67s/it] Loading checkpoint shards: 67%|██████████████████████████████████████████▋ | 2/3 [00:47<00:22, 22.45s/it] Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████| 3/3 [01:08<00:00, 22.06s/it] Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████| 3/3 [01:08<00:00, 22.90s/it]

the error log: 2023-05-04 23:10:40 | ERROR | stderr | Traceback (most recent call last): 2023-05-04 23:10:40 | ERROR | stderr | File "/mnt/dolphinfs/ssd_pool/docker/user/hadoop-mt-ocr/huwenping/conda_envs/llava/lib/python3.10/runpy.py", line 196, in _run_module_as_main 2023-05-04 23:10:40 | ERROR | stderr | return _run_code(code, main_globals, None, 2023-05-04 23:10:40 | ERROR | stderr | File "/mnt/dolphinfs/ssd_pool/docker/user/hadoop-mt-ocr/huwenping/conda_envs/llava/lib/python3.10/runpy.py", line 86, in _run_code 2023-05-04 23:10:40 | ERROR | stderr | exec(code, run_globals) 2023-05-04 23:10:40 | ERROR | stderr | File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-mt-ocr/huwenping/Repos/LLaVA/llava/serve/model_worker.py", line 363, in 2023-05-04 23:10:40 | ERROR | stderr | worker = ModelWorker(args.controller_address, 2023-05-04 23:10:40 | ERROR | stderr | File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-mt-ocr/huwenping/Repos/LLaVA/llava/serve/model_worker.py", line 120, in init 2023-05-04 23:10:40 | ERROR | stderr | self.tokenizer, self.model, self.image_processor, self.context_len = load_model( 2023-05-04 23:10:40 | ERROR | stderr | File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-mt-ocr/huwenping/Repos/LLaVA/llava/serve/model_worker.py", line 73, in load_model 2023-05-04 23:10:40 | ERROR | stderr | vision_tower = model.model.vision_tower[0] 2023-05-04 23:10:40 | ERROR | stderr | File "/mnt/dolphinfs/ssd_pool/docker/user/hadoop-mt-ocr/huwenping/conda_envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1614, in getattr 2023-05-04 23:10:40 | ERROR | stderr | raise AttributeError("'{}' object has no attribute '{}'".format( 2023-05-04 23:10:40 | ERROR | stderr | AttributeError: 'LlamaModel' object has no attribute 'vision_tower'

May 04 '23 15:05 microhu

Hi @microhu, can you share the command you use and the full error log? This can help me better figure out what the issue you are facing. Thanks.

it happens when i Launch a model worker

python -m llava.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path /mnt/dolphinfs/hdd_pool/docker/user/hadoop-mt-ocr/huwenping/HuggingFace_models/LLaVA-13b-v0 --multi-modal 2023-05-04 23:09:27 | INFO | model_worker | args: Namespace(host='0.0.0.0', port=40000, worker_address='http://localhost:40000', controller_address='http://localhost:10000', model_path='/mnt/dolphinfs/hdd_pool/docker/user/hadoop-mt-ocr/huwenping/HuggingFace_models/LLaVA-13b-v0', model_name=None, multi_modal=True, keep_aspect_ratio=False, num_gpus=1, limit_model_concurrency=5, stream_interval=2, no_register=False) 2023-05-04 23:09:27 | INFO | model_worker | Loading the model LLaVA-13b-v0 on worker c64123 ... Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s] Loading checkpoint shards: 33%|█████████████████████▎ | 1/3 [00:29<00:59, 29.67s/it] Loading checkpoint shards: 67%|██████████████████████████████████████████▋ | 2/3 [00:47<00:22, 22.45s/it] Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████| 3/3 [01:08<00:00, 22.06s/it] Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████| 3/3 [01:08<00:00, 22.90s/it]

the error log: 2023-05-04 23:10:40 | ERROR | stderr | Traceback (most recent call last): 2023-05-04 23:10:40 | ERROR | stderr | File "/mnt/dolphinfs/ssd_pool/docker/user/hadoop-mt-ocr/huwenping/conda_envs/llava/lib/python3.10/runpy.py", line 196, in _run_module_as_main 2023-05-04 23:10:40 | ERROR | stderr | return _run_code(code, main_globals, None, 2023-05-04 23:10:40 | ERROR | stderr | File "/mnt/dolphinfs/ssd_pool/docker/user/hadoop-mt-ocr/huwenping/conda_envs/llava/lib/python3.10/runpy.py", line 86, in _run_code 2023-05-04 23:10:40 | ERROR | stderr | exec(code, run_globals) 2023-05-04 23:10:40 | ERROR | stderr | File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-mt-ocr/huwenping/Repos/LLaVA/llava/serve/model_worker.py", line 363, in 2023-05-04 23:10:40 | ERROR | stderr | worker = ModelWorker(args.controller_address, 2023-05-04 23:10:40 | ERROR | stderr | File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-mt-ocr/huwenping/Repos/LLaVA/llava/serve/model_worker.py", line 120, in init 2023-05-04 23:10:40 | ERROR | stderr | self.tokenizer, self.model, self.image_processor, self.context_len = load_model( 2023-05-04 23:10:40 | ERROR | stderr | File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-mt-ocr/huwenping/Repos/LLaVA/llava/serve/model_worker.py", line 73, in load_model 2023-05-04 23:10:40 | ERROR | stderr | vision_tower = model.model.vision_tower[0] 2023-05-04 23:10:40 | ERROR | stderr | File "/mnt/dolphinfs/ssd_pool/docker/user/hadoop-mt-ocr/huwenping/conda_envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1614, in getattr 2023-05-04 23:10:40 | ERROR | stderr | raise AttributeError("'{}' object has no attribute '{}'".format( 2023-05-04 23:10:40 | ERROR | stderr | AttributeError: 'LlamaModel' object has no attribute 'vision_tower'

@haotian-liu above is the error message. and one more question what is the required env to install 'flash-attn', I tried different configurations but failed to install flash-attn succ.

May 04 '23 23:05 microhu

Hi @microhu, can you try pulling the latest repo again, as it seems that your code base has not upgraded to v0.1 yet. For example, you added --multi-modal, which is totally fine, but it should print out a warning message here if you are using the latest code base. Please also make sure the transformers is the correct the version following the instructions here, thanks!

May 04 '23 23:05 haotian-liu

@microhu also regarding the flash-attn, if you create a new environment following the installation commands, it should not raise issues usually. You may paste your error log of flash-attn here.

May 04 '23 23:05 haotian-liu

here

@haotian-liu Thanks for your quick reply. The error message is is quite long, looks like below

ptxas info : Used 201 registers, 632 bytes cmem[0], 16 bytes cmem[2] ptxas info : Compiling entry function '_Z29fmha_bwd_dq_dk_dv_loop_kernelI18FMHA_kernel_traitsILi128ELi64ELi16ELi1ELi8ELj8E13__nv_bfloat16ELb1ELb0ELin1EEv17FMHA_dgrad_params' for 'sm_80' ptxas info : Function properties for _Z29fmha_bwd_dq_dk_dv_loop_kernelI18FMHA_kernel_traitsILi128ELi64ELi16ELi1ELi8ELj8E13__nv_bfloat16ELb1ELb0ELin1EEv17FMHA_dgrad_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 225 registers, 632 bytes cmem[0], 16 bytes cmem[2] ptxas info : Compiling entry function '_Z29fmha_bwd_dq_dk_dv_loop_kernelI18FMHA_kernel_traitsILi128ELi64ELi16ELi1ELi8ELj8E13__nv_bfloat16ELb1ELb1ELin1EEv17FMHA_dgrad_params' for 'sm_80' ptxas info : Function properties for _Z29fmha_bwd_dq_dk_dv_loop_kernelI18FMHA_kernel_traitsILi128ELi64ELi16ELi1ELi8ELj8E13__nv_bfloat16ELb1ELb1ELin1EEv17FMHA_dgrad_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 229 registers, 632 bytes cmem[0], 16 bytes cmem[2] ninja: build stopped: subcommand failed. Traceback (most recent call last): File "/mnt/dolphinfs/ssd_pool/docker/user/hadoop-mt-ocr/huwenping/conda_envs/fastchat_native/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build subprocess.run( File "/mnt/dolphinfs/ssd_pool/docker/user/hadoop-mt-ocr/huwenping/conda_envs/fastchat_native/lib/python3.8/subprocess.py", line 516, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1. The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/pip-install-wvh_2hdw/flash-attn_f250c79ec2304275af0cdf40b9a09c77/setup.py", line 163, in <module>
    setup(
  File "/mnt/dolphinfs/ssd_pool/docker/user/hadoop-mt-ocr/huwenping/conda_envs/fastchat_native/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup
    return distutils.core.setup(**attrs)
  File "/mnt/dolphinfs/ssd_pool/docker/user/hadoop-mt-ocr/huwenping/conda_envs/fastchat_native/lib/python3.8/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/mnt/dolphinfs/ssd_pool/docker/user/hadoop-mt-ocr/huwenping/conda_envs/fastchat_native/lib/python3.8/distutils/dist.py", line 966, in run_commands
    self.run_command(cmd)
  File "/mnt/dolphinfs/ssd_pool/docker/user/hadoop-mt-ocr/huwenping/conda_envs/fastchat_native/lib/python3.8/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/mnt/dolphinfs/ssd_pool/docker/user/hadoop-mt-ocr/huwenping/conda_envs/fastchat_native/lib/python3.8/site-packages/setuptools/command/install.py", line 61, in run

May 04 '23 23:05 microhu

@microhu I do not find the exact error in this log, maybe you can upload to gist, or you can reach out to the flash-attn repo for help as well. And are you able to launch the model worker successfully after upgrading?

May 05 '23 02:05 haotian-liu

Hi, I am facing the same issue with the latest code and 13B v0 model.

The CLI script can load the 13B v1-1 without problem, but when I try to load the 13B v0 model, it will give

    vision_tower = model.model.vision_tower[0]
  File "xxxxxxxxxxxxxxx/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1614, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'LlamaModel' object has no attribute 'vision_tower'

I guess there is a version conflict.

May 09 '23 08:05 penghe2021

Hi @penghe2021, are you using the latest code base? You may need to use the latest code base in order to load both v0 and v1 checkpoints. Also, since you only provide the partial log, I am assuming this is a model_worker.py?

May 09 '23 22:05 haotian-liu

Hi Liu. Yes, I am using the latest code, but I just realized that I haven't update the 13B v0 model, let me try to download the latest v0 model and try with the code.

May 09 '23 23:05 penghe2021

@penghe2021 Oh you do not need to update the 13B model in order to use the latest code. And just to confirm, are you seeing this error in model_worker.py?

May 09 '23 23:05 haotian-liu

@penghe2021 I just verified that the latest code work with the v0 model. Do you see this line when you load your model? If you see this line, then the code should be correct and running fine.

And please make sure that the model name includes "llava": if 'llava' in model_path.lower():

2023-05-09 18:18:42 | INFO | model_worker | Loading the model LLaVA-13B-v0 on worker ae4bec ...
You are using a model of type llama to instantiate a model of type llava. This is not supported for all configurations of models and can yield errors.

May 09 '23 23:05 haotian-liu

I am using the old version of CLI code, maybe this is the issue, give me some time to try with the new code.

May 09 '23 23:05 penghe2021

UPDATE: after I download the latest 13B delta v0 weight and regenerate the llava model, it works without problem. Thanks for the help!

May 10 '23 18:05 penghe2021

@microhu I do not find the exact error in this log, maybe you can upload to gist, or you can reach out to the flash-attn repo for help as well. And are you able to launch the model worker successfully after upgrading?

yes. I solved all the issues by re-installing the whole project on a docker with newer cuda driver. Thanks

May 11 '23 10:05 microhu

LLaVA LLaVA copied to clipboard

AttributeError: 'LlamaModel' object has no attribute 'vision_tower'

LLaVA
LLaVA copied to clipboard