neural-compressor
neural-compressor copied to clipboard
FP4 encoding related
https://github.com/intel/neural-compressor/blob/4372a762585189accc65196e081a0a7a85f5af9e/neural_compressor/torch/algorithms/weight_only/utility.py#L69
FP4_BNB = [-12.0, -8.0, -6.0, -4.0, -3.0, -2.0, -0.0625, 0, 0.0625, 2.0, 3.0, 4.0, 6.0, 8.0, 12.0] FP4_E2M1 = [-6.0, -4.0, -3.0, -2.0, -1.5, -1.0, -0.0625, 0, 0.0625, 1.0, 1.5, 2.0, 3.0, 4.0, 6.0]
Why is FP4_E2M1 like this? How is 0.0625 computed? According to OCP-Spec shouldn't it be 0.5? Is FP4_BNB the result of left shifting FP4_E2M1 one bit? Does it correspond to becoming E3M0?