ultravox
ultravox copied to clipboard
Evaluate Ultravox performance when quantized
Quantize Ultravox to fp8 and determine how this affects the model's inference performance as well as speed. This would entail
- adding quantization to ultravox/infer
- adding a flag to infer_tool for quantization
- running infer_tool for evaluation and summarizing the output.