nm-vllm
nm-vllm copied to clipboard
A high-throughput and memory-efficient inference and serving engine for LLMs
Results
32
nm-vllm issues
Sort by
recently updated
recently updated
newest added
To be merged after upstream sync This PR does two things: - a) changes the set of tests that we run on remote push - b) converts to using environment...
## Notes This PR is a work in progress and based off of: https://github.com/vllm-project/vllm/pull/6396 so that will have to land before this. ## Description This PR introduces a spiritual successor...