QPyTorch
QPyTorch copied to clipboard
float_quantize at multi-gpu works wrong.
When I run model after applying float_quantize to weight or activation with multi-GPU, (huggingface opt-model with device_map='auto') quantization of layers allocated to second or later gpu works wrong. The output of quantization shows mostly 0-value.