AMDMIGraphX
AMDMIGraphX copied to clipboard
Scale is 0 for quantizelinear and dequantizelinear
Problem: Using a scale of 0 in dequantizelinear is incorrect- it makes the quantized input irrelevant. The optimization passes identify this as dead code and eliminate it. A proper dequantization scale should be a non-zero value that represents the actual quantization scale used when the data was originally quantized.
With the mistral-7b model with mixed precision. We see an error
/tmp/comgr-295d4c/input/main.cpp:9:60: error: unused parameter 'x0' [-Werror,-Wunused-parameter]
9 | __device__ __attribute__((const)) auto inner_pointwise(Tx0 x0,Tx1 x1) {
| ^
1 error generated when compiling for gfx942.
terminate called after throwing an instance of 'migraphx::version_2_13_0::exception'
what(): /longer_pathname_so_that_rpms_can_support_packaging_the_debug_info_for_all_os_profiles/src/AMDMIGraphX/src/targets/gpu/compile_ops.cpp:198: benchmark: No valid tuned compilation for pointwise with <no problem key>
Small Reproducer to understand the bug
p = migraphx.program()
m = p.get_main_module()
x_0 = m.add_parameter("m0", migraphx.shape(type="half_type", lens=[1, 256, 4096]))
x_1 = m.add_parameter("m1", migraphx.shape(type="half_type", lens=[1, 256, 4096]))
x_2 = m.add_literal(migraphx.fill_argument(migraphx.shape(type="half_type", lens=[1]), 0.1))
x_3 = m.add_instruction(migraphx.op("multibroadcast", out_lens=[1, 256, 4096]), [x_2])
x_4 = m.add_instruction(migraphx.op("dequantizelinear"), [x_0, x_3])
x_5 = m.add_instruction(migraphx.op("add"), [x_1, x_4])
m.add_return([x_5])
With scale = 0:
1. rewrite_quantization transforms dequantizelinear(x0, 0) into convert(x0) → mul(@3, 0) → add(x1, @4)
2. The simplify_algebra pass detects mul(@3, 0) and optimizes it to just 0
3. Then add(x1, 0) gets optimized to just x1 by the find_unit_ops pattern
4. This eliminates the entire x0 computation chain, leaving only @return(x1)
5. The C++ generator still expects both parameters but only uses x1, causing the unused parameter error
Expected behavior:
With scale = 0.1 or any small epsilon value:
1. rewrite_quantization transforms dequantizelinear(x0, 0.1) into convert(x0) → mul(@3, 0.1) → add(x1, @4)
2. Since 0.1 is not zero, the find_zero_ops optimization doesn't apply
3. The multiplication and addition operations remain in the graph
4. Both x0 and x1 are used in the final computation, so no unused parameter error occurs
Solution:
- Update the quantizer: Implement a preprocessing pass that replaces zero scales with small epsilon value.
- simplify_algebra fix: Find dequantizelinear and quantizliear that have 0 scale and replace it.