TensorRT-LLM how to use trt_llm to accelerate original llava-liuhaotian/llava-v1.5-7b?

when i use the example in multimodel, i download the original model-liuhaotian/llava-v1.5-7b,but some error occur? llama = from_hugging_face( File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1164, in from_hugging_face config = create_config_from_hugging_face(model_dir, File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1070, in create_config_from_hugging_face architecture = hf_config.architectures[0] , i found the config.txt is different between the liuhaotian/llava-v1.5-7b and llava-hf/llava-1.5-13b-hf,so how to use the trt_llm to the original model?thanks in advanced。

### Tasks

Mar 14 '24 07:03 ganliqiang

@byshiue @QiJune We are currently facing a pressing issue where our business model urgently requires improvements in performance from your work. Could you please let us know when you might be able to provide support for this issue? Alternatively, could you provide some guidance that would allow us to proceed with the work on our own? Thank you in advance for your response.

Mar 18 '24 06:03 ganliqiang

@ganliqiang Could you please share your commands?

Mar 20 '24 06:03 QiJune

i just follow the instruction, first, export MODEL_NAME="llava-v1.5-13b", replace the llava-1.5-13b-hf with the original model name llava-v1.5-13b, second, run the command python ../llama/convert_checkpoint.py
--model_dir tmp/hf_models/${MODEL_NAME}
--output_dir tmp/trt_models/${MODEL_NAME}/fp16/1-gpu
--dtype float16, the error occur:Traceback (most recent call last): File "/mnt/glq/trt_llm/TensorRT-LLM/examples/multimodal/../llama/convert_checkpoint.py", line 523, in main() File "/mnt/glq/trt_llm/TensorRT-LLM/examples/multimodal/../llama/convert_checkpoint.py", line 515, in main convert_and_save_hf(args) File "/mnt/glq/trt_llm/TensorRT-LLM/examples/multimodal/../llama/convert_checkpoint.py", line 434, in convert_and_save_hf llama = from_hugging_face( File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1164, in from_hugging_face config = create_config_from_hugging_face(model_dir, File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1070, in create_config_from_hugging_face architecture = hf_config.architectures[0] TypeError: 'NoneType' object is not subscriptable i guess the config.txt is differen between this two model.so the convert fail.i compare the two model arch is slightly different,so i do not know how to run the original llava model successfully.

Mar 20 '24 07:03 ganliqiang

try 0.7.1

Mar 21 '24 13:03 zh0ngtian

Hi, LMDeploy now support serving liuhaotian/llava-v1.5-7b and provides OpenAI-compatible APIs。Feedback is welcomed

These are related docs https://github.com/InternLM/lmdeploy/blob/main/docs/en/serving/api_server_vl.md https://github.com/InternLM/lmdeploy/blob/main/docs/en/inference/vl_pipeline.md

Mar 22 '24 09:03 irexyc

@ganliqiang could you use the hugging face checkpoint for this model? The hugging face model is supported and tested. TRT-LLM needs to reads the hf_config.architectures to make sure the TRT-LLM model class is used correctly.

Mar 22 '24 14:03 litaotju

Have you solved the problem yet? @ganliqiang

Apr 22 '24 08:04 xwqianbei

Have you solved the problem yet? @ganliqiang

yes ,but i do not use this framework,i switch to use llama.cpp.because the accuracy is a little low in my task.but the llama.cpp can maintain the same result.FYI,you can try too.

Apr 23 '24 02:04 ganliqiang

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

Jun 08 '24 01:06 github-actions[bot]