Cyrus Leung issues

Results 30 issues of


                                            Cyrus Leung

[Installation] Gate FastAPI version for Python 3.8

Upgrading FastAPI and Pydantic fails to solve #8212 for Python 3.8 users. To avoid this, this PR updates the dependencies to use an older version of FastAPI for Python 3.8....

ready

[Model] Support NVLM-D

Implement NVLM-D model. FIX #9040 FIX #9041

[CI/Build] Fix lint errors in mistral tokenizer

#9267 + #9446 introduced a mypy error. This PR fixes it.

ready

[CI/Build] Split up decoder-only LM tests

Split up decoder-only LM tests to avoid flakiness.

ready

[VLM] Enable overriding whether post layernorm is used in vision encoder + fix quant args

Also, `QuantizationConfig` and associated `prefix` argument is now passed to vision towers to maintain consistency. Nevertheless, since vision tower is not quantized by existing methods yet, we ignore it and...

[CI/Build] Remove unnecessary `fork_new_process`

In #8925, I accidentally copied over `fork_new_process_for_each_test` to the `large_gpu_test` when it's actually unnecessary (since `current_platform.get_device_total_memory` is supposed to be stateless).

ready

[Model] Add user-configurable task for models that support both generation and embedding

Follow-up to #9303 This PR adds a `--task` option which is used to determine which model runner (for generation or embedding) to create when initializing vLLM. The default (`auto`) will...

ready

Convert `anext` calls on Python 3.10+

Python 3.10 introduces the [`anext`](https://docs.python.org/3/library/functions.html#anext) builtin which calls `__anext__` on the provided async iterator. It would be great if `pyupgrade` could automatically convert legacy `async_iterator.__anext__()` calls into `anext(async_iterator)` upon upgrading...

[RFC]: Merge input processor and input mapper for multi-modal models

## Motivation ### Background To provide more control over the model inputs, we currently define two methods for multi-modal models in vLLM: - The **input processor** is called inside `LLMEngine`...

RFC

[Bugfix] Fix 2 Node and Spec Decode tests

Fix 2-Node test OOM failure in `distributed/test_pipeline_parallel.py::test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup18-ray-0-auto-test_option` Fix Spec Decode failure in `spec_decode/e2e/test_ngram_correctness.py::test_ngram_e2e_greedy_correctness[1--1-1-256-test_llm_kwargs0-baseline_llm_kwargs0-per_test_common_llm_kwargs0-common_llm_kwargs0]`

speculative-decoding

ready