Dipika Sikka
Dipika Sikka
# Summary - Add the ability to time function calls - Will be enabled unless the `--disable-log-stats` cli arg is used for the server as the timer's init and average...
# Summary - Initial support for Activation Quantization (specifically static-per tensor for W8A8) - Adds `CompressedTensorsConfig` and `CompressedTensorsLinearMethod` to support models quantized through [sparseml](https://github.com/neuralmagic/sparseml) and saved through [compressed-tensors](https://github.com/neuralmagic/compressed-tensors) - Adds...
- Blocked on k8 runners being available. Only aws runners currently work
# Summary - Add a step to publish the nightly wheel using the nm-action: https://github.com/neuralmagic/nm-actions/blob/main/actions/publish-whl/action.yml - Once built, updated to add in a step to build the nightly container using...
# Summary - Add a `CompressedTensorsW8A8DynamicToken` scheme to support dynamic-per token activation quantization - Update config parsing to support updates made to the `config.json` / quantization config provided with the...
# Summary - Updates the `gptq_marlin` parameters to use `vLLMParameters` to simplify linear layer weight loading - Updates to add `PackedColumnParameter` to support packed parameters without row parallelism
# Summary: - Splits up #6422 into two separate PRs. This is the first of the two. The second will leverage the weight loading changes introduced in this PR while...
FILL IN THE PR DESCRIPTION HERE FIX #xxxx (*link existing issues this PR will resolve*) **BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE** --- PR...