Cyrus Leung

Results 30 issues of Cyrus Leung

Upgrading FastAPI and Pydantic fails to solve #8212 for Python 3.8 users. To avoid this, this PR updates the dependencies to use an older version of FastAPI for Python 3.8....

ready

Implement NVLM-D model. FIX #9040 FIX #9041

#9267 + #9446 introduced a mypy error. This PR fixes it.

ready

Split up decoder-only LM tests to avoid flakiness.

ready

Also, `QuantizationConfig` and associated `prefix` argument is now passed to vision towers to maintain consistency. Nevertheless, since vision tower is not quantized by existing methods yet, we ignore it and...

In #8925, I accidentally copied over `fork_new_process_for_each_test` to the `large_gpu_test` when it's actually unnecessary (since `current_platform.get_device_total_memory` is supposed to be stateless).

ready

Follow-up to #9303 This PR adds a `--task` option which is used to determine which model runner (for generation or embedding) to create when initializing vLLM. The default (`auto`) will...

ready

Python 3.10 introduces the [`anext`](https://docs.python.org/3/library/functions.html#anext) builtin which calls `__anext__` on the provided async iterator. It would be great if `pyupgrade` could automatically convert legacy `async_iterator.__anext__()` calls into `anext(async_iterator)` upon upgrading...

## Motivation ### Background To provide more control over the model inputs, we currently define two methods for multi-modal models in vLLM: - The **input processor** is called inside `LLMEngine`...

RFC

Fix 2-Node test OOM failure in `distributed/test_pipeline_parallel.py::test_tp_language_generation[microsoft/Phi-3.5-MoE-instruct-parallel_setup18-ray-0-auto-test_option` Fix Spec Decode failure in `spec_decode/e2e/test_ngram_correctness.py::test_ngram_e2e_greedy_correctness[1--1-1-256-test_llm_kwargs0-baseline_llm_kwargs0-per_test_common_llm_kwargs0-common_llm_kwargs0]`

speculative-decoding
ready