auto-round
auto-round copied to clipboard
Set the default scale_dtype to FP16
There's no necessity to use FP32 scale for packing with the autogptq Triton backend. We can instead set FP16 scale dtype as the default. Nonetheless, it's essential to validate accuracy for some models.
aligned