VILA Trying to run VILA on triton with triton

Trying to run VILA on triton with triton_llm backend

Open dand-milestone opened this issue 1 year ago • 0 comments

After successfully creating the engine files I want to deploy the VILA model with triton server.

However, it fails because transformers doesn't recognize the model type "llava_llama"

[TensorRT-LLM][WARNING] stats_check_period_ms is not specified, will be set to 100 (ms)
I0903 13:38:59.297664 1309 libtensorrtllm.cc:184] "TRITONBACKEND_ModelInstanceInitialize: tensorrt_llm_0_0"
I0903 13:38:59.298296 1309 model_lifecycle.cc:839] "successfully loaded 'tensorrt_llm'"
I0903 13:38:59.443154 1309 pb_stub.cc:366] "Failed to initialize Python stub: ValueError: The checkpoint you are trying to load has model type `llava_llama` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.\n\nAt:\n  /usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py(1008): from_pretrained\n  /usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(856): from_pretrained\n  /opt/tritonserver/multimodal_ifb/postprocessing/1/model.py(81): initialize\n"
[TensorRT-LLM][WARNING] Don't setup 'skip_special_tokens' correctly (set value is ${skip_special_tokens}). Set it as True by default.
....

Any pointer on how to create a working model_repository for the VILA model woul dbe highly appreciated.

Sep 03 '24 13:09 dand-milestone

VILA VILA copied to clipboard

Trying to run VILA on triton with triton_llm backend

VILA
VILA copied to clipboard