Isotr0py

Results 20 issues of Isotr0py

FILL IN THE PR DESCRIPTION HERE FIX #9024 (*link existing issues this PR will resolve*) - Minor fix for `IndexError: list index out of range` on CPU backend **BEFORE SUBMITTING,...

x86 CPU

FILL IN THE PR DESCRIPTION HERE FIX #xxxx (*link existing issues this PR will resolve*) This PR aims to refactor the GGUF implementation on merged linear layer (`qkv_proj` and `gate_up_proj`)...

- Add support for Phi-3-vision models ([microsoft/Phi-3-vision-128k-instruct](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct) and [microsoft/Phi-3.5-vision-instruct](https://huggingface.co/microsoft/Phi-3.5-vision-instruct))

- There will be error logs due to `hf_list_repo_files` calling when model repo is local: ``` INFO 02-16 13:46:11 __init__.py:190] Automatically detected platform cuda. ERROR 02-16 13:46:11 config.py:102] Error retrieving...

Issue discussion on Slack: https://vllm-dev.slack.com/archives/C07R5Q1Q2BB/p1739776343893149?thread_ts=1739553140.299949&cid=C07R5Q1Q2BB - `transformers` backend failed to load custom module on multiproc executor with `VLLM_WORKER_MULTIPROC_METHOD=spawn` because false-positive loaded custom module. - This PR optimize the automap resolving...

- [x] Add BNB support for `transformers` backend - [x] Update the available quantization in `transformers` backend docs.

documentation
ready

- To support different VLM HF dataset, the sampling function for hf dataset is growing large and complex in `benchmark_serving.py` - This PR aims to separate and decouple hf dataset...

**TODO** - [x] Fix profiling issue - [x] Add processor test

Related issue: #12724 - Rename v1 `ROCmAttention` to `TritonAttention` and add fallback for Turing GPU - Since v1 ROCm attn backend is implemented with Triton, it can be used as...

rocm
needs-rebase
v1