neural-compressor FP4 encoding related

FP4 encoding related

Open Tiantian-Han opened this issue 1 year ago • 0 comments

https://github.com/intel/neural-compressor/blob/4372a762585189accc65196e081a0a7a85f5af9e/neural_compressor/torch/algorithms/weight_only/utility.py#L69

FP4_BNB = [-12.0, -8.0, -6.0, -4.0, -3.0, -2.0, -0.0625, 0, 0.0625, 2.0, 3.0, 4.0, 6.0, 8.0, 12.0] FP4_E2M1 = [-6.0, -4.0, -3.0, -2.0, -1.5, -1.0, -0.0625, 0, 0.0625, 1.0, 1.5, 2.0, 3.0, 4.0, 6.0]

Why is FP4_E2M1 like this? How is 0.0625 computed? According to OCP-Spec shouldn't it be 0.5? Is FP4_BNB the result of left shifting FP4_E2M1 one bit? Does it correspond to becoming E3M0?

Jul 01 '24 03:07 Tiantian-Han

neural-compressor neural-compressor copied to clipboard

FP4 encoding related

neural-compressor
neural-compressor copied to clipboard