Isotr0py
Isotr0py
You can put it in `_EMBEDDING_MODELS`.
#19872 should fix the CI deadlock, you can merge from main to get the fix.
See: https://docs.vllm.ai/en/latest/models/supported_models.html#id2 ``` ^ You need to set the architecture name via --hf-overrides to match the one in vLLM. • For example, to use DeepSeek-VL2 series models: --hf-overrides '{"architectures": ["DeepseekVLV2ForCausalLM"]}'...
Seems just the network timeout when fetching the test video, retrying.
That's because the input image size is too large, causing the number of image tokens larger than `max_position_embeddings=4096` (the default config value for deepseek-vl2). The model should work if you...
Oh, seems that there are some issues in deepseek-vl2's hf_processor, let me investigate it deeper tonight.
Seems this field was initially added to CPU executor in the earliest CPU support PR (#3634) with declaration missing in `CacheConfig`. Since everything worked well after that, nobody catched this...
[New Model]: We can able to run phi-3.5 vision instruct model but wanted to run in int4 quantization
I'm not sure which quantization "int4 quantization" exactly means here, because seems that there is no BNB 4-bit quantized Phi3-V model released in HF. (The code given above is using...
[New Model]: We can able to run phi-3.5 vision instruct model but wanted to run in int4 quantization
It costs about 4GB VRAM to run 4-bit awq quantized Phi-3.5-vision-instruct. BTW, the AWQ model I uploaded is calibrated with default dataset in `autoawq`, because I just used it to...
> Not sure where is the best place to put this part. I'm OK to move this to somewhere else. Seems that the `Nginx Loadbalancer` part is put to `Getting...