text-generation-inference
text-generation-inference copied to clipboard
Does tgi support image resize for qwen2-vl pipeline?
trafficstars
System Info
I try to deploy a qwen2-vl fine-tuned model with tgi and vllm, and I've found some results between these two frameworks are different. Seems that tgi consume more tokens compared to vLLM. I checked TGI's code and seems there miss the image resize logic? For Qwen2-VL pipeline, we will resize the image based on two args max_pixels and min_pixels.
Information
- [x] Docker
- [ ] The CLI directly
Tasks
- [ ] An officially supported command
- [ ] My own modifications
Reproduction
Deploy a Qwen2-VL-7B model on the inference endpoint, and upload a large image will trigger an error that the input tokens are larger than 32768
### Expected behavior
The server will resize the image based on preprocessor_config.json(max_pixels and min_pixels) and make sure the image tokens will not be too many for a request.
@AHEADer can you provide the docker command you are using with Qwen2-VL-7B?