inference ENH: Disable 4-bit and 8-bit quantization on MacOS

ENH: Disable 4-bit and 8-bit quantization on MacOS

Open Bojun-Feng opened this issue 1 year ago • 2 comments

Resolve #483 at frontend level by filtering options on render if machine is Mac-like.

Tested locally, successfully removed 4-bit and 8-bit quantization on MacBook

Oct 06 '23 23:10 Bojun-Feng

For most models, 8-bit quant should work on macOS.

However, if the model uses bf16, it cannot run on macOS, even the non-quant model.

Oct 07 '23 02:10 UranusSeven

In this case, maybe we can add a field to the model json specifying whether model is bf16. Am I missing a easier way to distinguish bf16 models?

Oct 07 '23 02:10 Bojun-Feng