ultravox icon indicating copy to clipboard operation
ultravox copied to clipboard

Evaluate Ultravox performance when quantized

Open juberti opened this issue 8 months ago • 0 comments

Quantize Ultravox to fp8 and determine how this affects the model's inference performance as well as speed. This would entail

  • adding quantization to ultravox/infer
  • adding a flag to infer_tool for quantization
  • running infer_tool for evaluation and summarizing the output.

juberti avatar Jun 05 '24 20:06 juberti