TensorRT-LLM
TensorRT-LLM copied to clipboard
how to use trt_llm to accelerate original llava-liuhaotian/llava-v1.5-7b?
when i use the example in multimodel, i download the original model-liuhaotian/llava-v1.5-7b,but some error occur? llama = from_hugging_face( File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1164, in from_hugging_face config = create_config_from_hugging_face(model_dir, File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1070, in create_config_from_hugging_face architecture = hf_config.architectures[0] , i found the config.txt is different between the liuhaotian/llava-v1.5-7b and llava-hf/llava-1.5-13b-hf,so how to use the trt_llm to the original model?thanks in advanced。
### Tasks
@byshiue @QiJune We are currently facing a pressing issue where our business model urgently requires improvements in performance from your work. Could you please let us know when you might be able to provide support for this issue? Alternatively, could you provide some guidance that would allow us to proceed with the work on our own? Thank you in advance for your response.
@ganliqiang Could you please share your commands?
i just follow the instruction,
first, export MODEL_NAME="llava-v1.5-13b", replace the llava-1.5-13b-hf with the original model name llava-v1.5-13b,
second, run the command python ../llama/convert_checkpoint.py
--model_dir tmp/hf_models/${MODEL_NAME}
--output_dir tmp/trt_models/${MODEL_NAME}/fp16/1-gpu
--dtype float16,
the error occur:Traceback (most recent call last):
File "/mnt/glq/trt_llm/TensorRT-LLM/examples/multimodal/../llama/convert_checkpoint.py", line 523, in
try 0.7.1
Hi, LMDeploy now support serving liuhaotian/llava-v1.5-7b and provides OpenAI-compatible APIs。Feedback is welcomed
These are related docs https://github.com/InternLM/lmdeploy/blob/main/docs/en/serving/api_server_vl.md https://github.com/InternLM/lmdeploy/blob/main/docs/en/inference/vl_pipeline.md
@ganliqiang could you use the hugging face checkpoint for this model? The hugging face model is supported and tested. TRT-LLM needs to reads the hf_config.architectures to make sure the TRT-LLM model class is used correctly.
Have you solved the problem yet? @ganliqiang
Have you solved the problem yet? @ganliqiang
yes ,but i do not use this framework,i switch to use llama.cpp.because the accuracy is a little low in my task.but the llama.cpp can maintain the same result.FYI,you can try too.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."