inference
inference copied to clipboard
ENH: Disable 4-bit and 8-bit quantization on MacOS
Resolve #483 at frontend level by filtering options on render if machine is Mac-like.
Tested locally, successfully removed 4-bit and 8-bit quantization on MacBook
For most models, 8-bit quant should work on macOS.
However, if the model uses bf16, it cannot run on macOS, even the non-quant model.
In this case, maybe we can add a field to the model json specifying whether model is bf16. Am I missing a easier way to distinguish bf16 models?