nm-vllm issues

[WIP] afeldman-nm/encoder decoder

GOALS • Whisper support • Exemplifies encoder/decoder (E/D) support • E/D K/V caching • E/D parallelism TESTING • HuggingFace whisper model • Replicate public English Speech Recognition (SR) test using...

afeldman-nm

Update Dockerfile for the release

install release version of nm-magic-wand

dhuangnm

Fix quantization rounding

Make ROCM rounding match Torch.

varun-sundar-rabindranath

Make benchmark script py38-compatible

**SUMMARY** Fix the Python 3.8 compatibility in one of our benchmarking scripts. **TEST PLAN** The following command should complete successfully in a py38 environment: ```shell python \ -m neuralmagic.benchmarks.run_benchmark_serving \...

dbarbuzzi

Upstream sync 2024 05 19

Upstream sync 2024 05 25 (#249) SUMMARY: Merge commits from https://github.com/vllm-project/vllm/commit/c7f2cf2b7f67bce5842fedfdba508440fe257375 to https://github.com/vllm-project/vllm/commit/f68470e803df575f294e67167b4b83adfe004cfa Note that https://github.com/vllm-project/vllm/commit/c7f2cf2b7f67bce5842fedfdba508440fe257375 is NOT included in this merge. --- PR Checklist (Click to Expand) Thank you...

robertgshaw2-redhat

[CI/Build] Basic server correctness test

5

Introducing an end-to-end test case that verifies basic correctness of the vllm server by comparing the tokens output by the vllm OpenAI server with tokens generated by the HuggingFace model...

derekk-nm