Dipika Sikka issues

Results 21 issues of


                                            Dipika Sikka

[Timings] Add the ability to log times for async and sync calls

# Summary - Add the ability to time function calls - Will be enabled unless the `--disable-log-stats` cli arg is used for the server as the timer's init and average...

[Kernel] Initial Activation Quantization Support

# Summary - Initial support for Activation Quantization (specifically static-per tensor for W8A8) - Adds `CompressedTensorsConfig` and `CompressedTensorsLinearMethod` to support models quantized through [sparseml](https://github.com/neuralmagic/sparseml) and saved through [compressed-tensors](https://github.com/neuralmagic/compressed-tensors) - Adds...

Dipika Sikka

[Timings] Add the ability to log times for async and sync calls

[Kernel] Initial Activation Quantization Support

[GHA] Add workflow files to run weekly and nightly tests/run llama-7b models

update

[GHA] Add steps to publish nightly wheel and build nightly container

[WIP] Update/expand finetune tests

[Activation Quantization] Dynamic Per Token Support

[Misc] Update `gptq_marlin` to use new vLLMParameters

[Misc] Update Fused MoE weight loading

add awq fused moe method