I just follw the step, but when I run the following code :

Load model directly

from transformers import AutoModel model = AutoModel.from_pretrained("Efficient-Large-Model/Llama-3-VILA1.5-8B")

it raise an error:ValueError: The checkpoint you are trying to load has model type llava_llama but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

My transformers is 4.37.2

Sep 06 '24 12:09 eternal8080

this is usually caused by your Vila is not properly installed

Sep 08 '24 00:09 Lyken17

version of transformers not correct

Sep 08 '24 11:09 Davidup1

version of transformers not correct

Hello, may I ask what the transformer version should be?

Sep 08 '24 12:09 gehong-coder

try 4.36.2?It runs well on VILA1.5. @gehong-coder

Sep 08 '24 12:09 Davidup1

try 4.36.2?It runs well on VILA1.5. @gehong-coder hello I use pip install git+https://github.com/huggingface/[email protected]

/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/vila/lib/python3.10/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. _torch_pytree._register_pytree_node( /mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/vila/lib/python3.10/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. _torch_pytree._register_pytree_node( [2024-09-08 20:53:49,139] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) /mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/vila/lib/python3.10/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. _torch_pytree._register_pytree_node( Traceback (most recent call last): File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/vila/lib/python3.10/runpy.py", line 187, in _run_module_as_main mod_name, mod_spec, code = _get_module_details(mod_name, _Error) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/vila/lib/python3.10/runpy.py", line 110, in _get_module_details import(pkg_name) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/workspace/github/VILA/llava/init.py", line 1, in from .entry import * File "/mnt/nfs/bj4-v100-1/data1/hong.ge/workspace/github/VILA/llava/entry.py", line 7, in from llava.model.builder import load_pretrained_model File "/mnt/nfs/bj4-v100-1/data1/hong.ge/workspace/github/VILA/llava/model/init.py", line 1, in from .language_model.llava_llama import LlavaLlamaConfig, LlavaLlamaModel File "/mnt/nfs/bj4-v100-1/data1/hong.ge/workspace/github/VILA/llava/model/language_model/llava_llama.py", line 28, in from ..llava_arch import LlavaMetaForCausalLM, LlavaMetaModel File "/mnt/nfs/bj4-v100-1/data1/hong.ge/workspace/github/VILA/llava/model/llava_arch.py", line 41, in from llava.model.multimodal_encoder.builder import build_vision_tower File "/mnt/nfs/bj4-v100-1/data1/hong.ge/workspace/github/VILA/llava/model/multimodal_encoder/builder.py", line 26, in from .siglip_encoder import SiglipVisionTower, SiglipVisionTowerS2 File "/mnt/nfs/bj4-v100-1/data1/hong.ge/workspace/github/VILA/llava/model/multimodal_encoder/siglip_encoder.py", line 18, in from transformers import PretrainedConfig, SiglipImageProcessor, SiglipVisionModel ImportError: cannot import name 'SiglipImageProcessor' from 'transformers' (/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/vila/lib/python3.10/site-packages/transformers/init.py)

what should ido ? thanks

Sep 08 '24 12:09 gehong-coder

@gehong-coder oh, maybe the repository has been upgraded. You can see the original version of guide:

conda create -n vila python=3.10 -y
conda activate vila

pip install --upgrade pip  # enable PEP 660 support
# this is optional if you prefer to system built-in nvcc.
conda install -c nvidia cuda-toolkit -y
wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.4.2/flash_attn-2.4.2+cu118torch2.0cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
pip install flash_attn-2.4.2+cu118torch2.0cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
pip install -e .
pip install -e ".[train]"

pip install git+https://github.com/huggingface/[email protected]
site_pkg_path=$(python -c 'import site; print(site.getsitepackages()[0])')
cp -rv ./llava/train/transformers_replace/* $site_pkg_path/transformers/

Sep 08 '24 13:09 Davidup1

and pyproject.toml

[build-system] requires = ["setuptools>=61.0"] build-backend = "setuptools.build_meta"

[project] name = "vila" version = "1.0.0" description = "VILA: On Pre-training for Visual Language Models" readme = "README.md" requires-python = ">=3.8" classifiers = [ "Programming Language :: Python :: 3", "License :: OSI Approved :: Apache Software License", ] dependencies = [ "torch==2.0.1", "torchvision==0.15.2", "transformers==4.36.2", "tokenizers>=0.15.2", "sentencepiece==0.1.99", "shortuuid", "accelerate==0.27.2", "peft==0.5.0", "bitsandbytes==0.41.0", "pydantic<2,>=1", "markdown2[all]", "numpy", "scikit-learn==1.2.2", "gradio==3.35.2", "gradio_client==0.2.9", "requests", "httpx==0.24.0", "uvicorn", "fastapi", "einops==0.6.1", "einops-exts==0.0.4", "timm==0.9.12", "openpyxl==3.1.2", "pytorchvideo==0.1.5", "decord==0.6.0", "datasets==2.16.1", "openai==1.8.0", "webdataset==0.2.86", "nltk==3.3", "pywsd==1.2.4", "opencv-python==4.8.0.74", "s2wrapper@git+https://github.com/bfshi/scaling_on_scales", ]

[project.optional-dependencies] train = ["deepspeed==0.9.5", "ninja", "wandb"] eval = ["mmengine", "word2number", "Levenshtein", "nltk", "pywsd"]

[project.urls] "Homepage" = "https://hanlab.mit.edu/projects/vila" "Bug Tracker" = "https://github.com/Efficient-Large-Model/VILA/issues"

[tool.setuptools.packages.find] exclude = ["assets*", "benchmark*", "docs", "dist*", "playground*", "scripts*", "tests*"]

[tool.wheel] exclude = ["assets*", "benchmark*", "docs", "dist*", "playground*", "scripts*", "tests*"]

Sep 08 '24 13:09 Davidup1

hello There are still errors. Why is this environment so difficult to build? The VILA1.5 model I downloaded cannot be loaded. Isn’t this code version universal?

Traceback (most recent call last): File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/opensora/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/opensora/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/workspace/github/VILA/llava/eval/run_vila.py", line 157, in eval_model(args) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/workspace/github/VILA/llava/eval/run_vila.py", line 68, in eval_model tokenizer, model, image_processor, context_len = load_pretrained_model(args.model_path, model_name, args.model_base) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/workspace/github/VILA/llava/model/builder.py", line 151, in load_pretrained_model model = LlavaLlamaModel(config=config, low_cpu_mem_usage=True, **kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/workspace/github/VILA/llava/model/language_model/llava_llama.py", line 43, in init return self.init_vlm(config=config, *args, **kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/workspace/github/VILA/llava/model/llava_arch.py", line 61, in init_vlm cfgs = get_model_config(config) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/workspace/github/VILA/llava/model/utils.py", line 36, in get_model_config valid_hf_repo = repo_exists(root_path) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/opensora/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn return fn(*args, **kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/opensora/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 2634, in repo_exists self.repo_info(repo_id=repo_id, repo_type=repo_type, token=token) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/opensora/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn return fn(*args, **kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/opensora/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 2588, in repo_info return method( File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/opensora/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn return fn(*args, **kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/opensora/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 2372, in model_info r = get_session().get(path, headers=headers, timeout=timeout, params=params) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/opensora/lib/python3.10/site-packages/requests/sessions.py", line 602, in get return self.request("GET", url, **kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/opensora/lib/python3.10/site-packages/requests/sessions.py", line 589, in request resp = self.send(prep, **send_kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/opensora/lib/python3.10/site-packages/requests/sessions.py", line 703, in send r = adapter.send(request, **kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/opensora/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 66, in send return super().send(request, *args, **kwargs) File "/mnt/nfs/bj4-v100-1/data1/hong.ge/miniconda3/envs/opensora/lib/python3.10/site-packages/requests/adapters.py", line 519, in send raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /api/models/Efficient-Large-Model/VILA1.5-3b (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f05c4a8a9e0>: Failed to establish a new connection: [Errno 101] Network is unreachable'))"), '(Request ID: 2546061a-6ded-4e3d-b5f5-87b097809eca)')

Sep 08 '24 14:09 gehong-coder

@gehong-coder this is from network error, maybe shutting down your proxy would work?

Sep 08 '24 14:09 Davidup1

@gehong-coder this is from network error, maybe shutting down your proxy would work?

i have down the model, the model in cache I must use this path like this can work: python -W ignore -m llava.eval.run_vila
--model-path /home/hong.ge/.cache/huggingface/hub/models--Efficient-Large-Model--VILA1.5-3b/snapshots/42d1dda6807cc521ef27674ca2ae157539d17026
--conv-mode vicuna_v1
--query "

but I use model name not work? python -W ignore -m llava.eval.run_vila
--model-path Efficient-Large-Model/VILA1.5-3b
--conv-mode vicuna_v1
--query "

Sep 09 '24 01:09 gehong-coder

I just follw the step, but when I run the following code :

Load model directly

from transformers import AutoModel model = AutoModel.from_pretrained("Efficient-Large-Model/Llama-3-VILA1.5-8B")

it raise an error:ValueError: The checkpoint you are trying to load has model type llava_llama but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

My transformers is 4.37.2

I also tried the huggingface functions and it didn't work. The inference command line in README works.

Sep 10 '24 04:09 yeyingdege

I'm still getting this error:

ValueError: The checkpoint you are trying to load has model type `llava_llama` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

I uninstalled transformers and reinstalled with pip install transformers --no-cache but no luck

Sep 24 '24 02:09 nate-walter

I tried this today. It sounds like a model registry issue. Model loading worked for me if I imported llava first.

import llava
from transformers import AutoModel
model = AutoModel.from_pretrained("Efficient-Large-Model/Llama-3-LongVILA-8B-1024Frames")

Nov 19 '24 05:11 srama2512

Note that VILA has different model file structures and running VILA requires installing the repo first.

Nov 19 '24 14:11 Lyken17

Qwen2ForCausalLM生成llm模型时读取到了NVILA的结构。造成这种问题有两种可能：1）脚本配置有问题，请检查你的scripts/nvila_example.sh，确保参数和上面示例中的一致；2）entry.py290行左右自动进行了+"/llm"的拼接，你可能因为其他报错导致这一行代码没有成功执行，请解决其他报错。

Jul 22 '25 07:07 ztianlin

VILA VILA copied to clipboard

ValueError: The checkpoint you are trying to load has model type `llava_llama` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

Load model directly

Load model directly

VILA
VILA copied to clipboard