nm-vllm
nm-vllm copied to clipboard

neuralmagic

→

Metadata

A high-throughput and memory-efficient inference and serving engine for LLMs

Reame
Issues

Results 32 nm-vllm issues

Sort by recently updated

Skipping refactor

To be merged after upstream sync This PR does two things: - a) changes the set of tests that we run on remote push - b) converts to using environment...

robertgshaw2-redhat

[WIP, Kernel] (1/N) Machete - Hopper Optimized Mixed Precision Linear Kernel

1

## Notes This PR is a work in progress and based off of: https://github.com/vllm-project/vllm/pull/6396 so that will have to land before this. ## Description This PR introduces a spiritual successor...

LucasWilkinson