qonnx
qonnx copied to clipboard
Add aditional rounding modes
This pull request introduces some additional rounding modes, and provides a table, that more accurately describes their behavior. Concretely, the following table has been added to docs/qonnx-custom-ops/quant_op.md:
Number \ ROUNDING_MODE | ROUND=HALF_EVEN | CEIL | FLOOR | UP | DOWN | HALF_UP | HALF_DOWN |
---|---|---|---|---|---|---|---|
5.5 | 6 | 6 | 5 | 6 | 5 | 6 | 5 |
2.5 | 2 | 3 | 2 | 3 | 2 | 3 | 2 |
1.6 | 2 | 2 | 1 | 2 | 1 | 2 | 2 |
1.1 | 1 | 2 | 1 | 2 | 1 | 1 | 1 |
1.0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
-1.0 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
-1.1 | -1 | -1 | -2 | -2 | -1 | -1 | -1 |
-1.6 | -2 | -1 | -2 | -2 | -1 | -2 | -2 |
-2.5 | -2 | -2 | -3 | -3 | -2 | -3 | -2 |
-5.5 | -6 | -5 | -6 | -6 | -5 | -6 | -5 |
The newly introduced rounding modes are: UP, DOWN, HALF_UP, and HALF_DOWN. These rounding modes were inspired by rounding modes in the java math library (https://docs.oracle.com/javase/8/docs/api/java/math/RoundingMode.html), and the implementation in the Chisel dsptools library (https://github.com/ucb-bar/dsptools/blob/master/src/main/scala/dsptools/numbers/chisel_types/FixedPointTypeClass.scala#L156).
This issue partially solves the incompatibility between a high-level python implementation and a circuit implementation. For instance, consider the following test function for QKeras (v0.9.0):
def test_quantized_bits_rounding_mode():
alpha1 = qkeras.quantized_bits(bits=3, integer=2, keep_negative=True, alpha=1)
alpha111 = qkeras.quantized_bits(bits=3, integer=2, keep_negative=True, alpha=[1, 1, 1])
alpha_po2 = qkeras.quantized_bits(bits=3, integer=2, keep_negative=True, alpha='auto_po2')
try:
assert np.array_equal(alpha1(np.array([2.5, 2.5, 3.5])), alpha111(np.array([2.5, 2.5, 3.5])))
assert np.array_equal(alpha1(np.array([2.5, 2.5, 3.5])), alpha_po2(np.array([2.5, 2.5, 3.5])))
finally:
print(alpha1.scale)
print(alpha111.scale)
print(alpha_po2.scale)
The function above will fail on the second assert. However, the scaling factors printed in the finally block will be 1, [1,1,1] and [1,1,1]. The reason is that when using "auto_po2" the rounding mode is actually "round half up". This can be seen on: https://github.com/google/qkeras/blob/67e7c6b8cbd6befd594f142187ac4b73b35512ac/qkeras/quantizers.py#L570C45-L570C46
v = tf.floor(tf.abs(x) / scale + 0.5)
This pull request does the following:
- Adds rounding modes to spec.
- Ads implementation of the rounding modes to
resolve_rounding_mode
function in src/qonnx/custom_op/general/quant.py. - Ads a simple test to check the implementation of the rounding modes tests/custom_op/test_rounding_mode.py.
The request does NOT do the following:
- It does not fix the QKeras/Brevitas converters.
I refrained from updating the converters because I don't know the code base very well, and secondly the tests seem to be written with assert_allclose, i.e. approximate compatibility. Issues with rounding modes can be quite subtle, so they would be hard to catch with approximate compatibility.
I have had success making a bit accurate conversion between QKeras and circuits in chisel4ml, after I introduced precise rounding modes. However, this is only when all tensors had a known quantization, and the scaling factor is power-of-two. Looking at the qonnx code base, I have a hard time seeing how the input quantization is specified. In chisel4ml for instance, this is done directly as shown:
x = x_in = tf.keras.layers.Input(shape=3)
x = qkeras.QActivation(
qkeras.quantized_bits(bits=4, integer=3, keep_negative=True)
)(x)
x = qkeras.QDense(
4,
kernel_quantizer=qkeras.quantized_bits(
bits=4, integer=3, keep_negative=True, alpha=np.array([0.5, 0.25, 1, 0.25])
),
)(x)
x = qkeras.QActivation(qkeras.quantized_relu(bits=3, integer=3))(x)
x = qkeras.QDense(
1,
kernel_quantizer=qkeras.quantized_bits(
bits=4, integer=3, keep_negative=True, alpha=np.array([0.125])
),
)(x)
x = qkeras.QActivation(qkeras.quantized_relu(bits=3, integer=3))(x)
model = tf.keras.Model(inputs=[x_in], outputs=[x])
This means that the inputs must be quantized to a signed 4-bit integer. I realize that qonnx targets a larger subset of neural network descriptions, however, I believe that it would be useful to make a distinction for these kind of networks(https://arxiv.org/abs/2011.10680 this paper calls them Dyadic Neural networks), as:
- they are highly efficient to implement in hardware, and
- I believe they can be "simulated" with bit-level accuracy using floating-point operations.
I have only empirically shown bit-level accuracy, however, considering the way floating-point is specified (having a power-of-two exponent bits) the equivalence should hold, as long as the mantisa/fraction field is not to big. And if it does get to big, you can also move to 64-bit floating-point number for example.