onnx2tf
onnx2tf copied to clipboard
Improved grid_sample
Issue Type
Others
OS
Linux
onnx2tf version number
1.18.14
onnx version number
onnxruntime version number
onnxsim (onnx_simplifier) version number
tensorflow version number
2.14
Download URL for ONNX
Parameter Replacement JSON
-
Description
TL;DR: I believe grid_sample
is being converted incorrectly, plus it can be made much smaller in size and ~5x faster.
So I went on a search for a better Tensorflow substitute of grid_sample
, found some interesting stuff here: https://github.com/PINTO0309/onnx2tf/issues/426
The bug
The problem occurs with padding_mode='zero'
when pixel index goes out of image bounds by less than a whole pixel. Consider this one-dimensional example:
Let x = -0.4
O[x] = I[x0]*0.6 + I[x1]*0.4
, where x0 = -1
, x1 = 0
, and I[-1] = 0
as an out-of-bounds pixel.
Instead, the current code sets the entire O[x]
to 0.
I think the best method to do this right is to zero-pad input image by one pixel to each side and add 1 to all pixel indices. The expensive post-processing phase becomes unnecessary.
Broken TFLiteConverter
The way Tensorflow converts gather_nd
to TFLite is completely broken. Not only is it offensively slow, it also adds this suspicious Concatenation
op with a massive tensor of zeros inside.
But 1D gather
seems to be alright, so that's what I ended up using.
def gather(input, y, x, b, h, w, c, padding_mode):
# Slow!
# return tf.gather_nd(params=input, indices=tf.cast(tf.concat([y, x], axis=-1), dtype=tf.int32), batch_dims=1)
if padding_mode == 'zeros':
w_padded = w + 2
h_padded = h + 2
linear_coordinates = tf.cast(y * w_padded + x, dtype=tf.int32)
linear_coordinates = tf.reshape(linear_coordinates, shape=(b, h, w))
input = tf.reshape(input, shape=(b, h_padded * w_padded, c))
else:
linear_coordinates = tf.cast(y * w + x, dtype=tf.int32)
linear_coordinates = tf.reshape(linear_coordinates, shape=(b, h, w))
input = tf.reshape(input, shape=(b, h * w, c))
out = tf.gather(params=input, indices=linear_coordinates, batch_dims=1)
return out
Full code: https://github.com/AlexanderLutsenko/nobuco/blob/aa4745e6abb1124d90f7d3ace6d282f923f08a40/nobuco/node_converters/grid_sampling.py#L38
Correctness tests: https://github.com/AlexanderLutsenko/nobuco/blob/aa4745e6abb1124d90f7d3ace6d282f923f08a40/examples/grid_samplers.py
Benchmark results, Snapdragon 662:
name | size | XNNPACK avg |
---|---|---|
1x3x32x32_grid_sampler_new.tflite | 0.0092 Mb | 0.2618 ms |
1x3x32x32_grid_sampler_old.tflite | 0.0177 Mb | 1.3888 ms |
1x3x64x64_grid_sampler_new.tflite | 0.0093 Mb | 1.0207 ms |
1x3x64x64_grid_sampler_old.tflite | 0.0424 Mb | 5.5717 ms |
1x3x128x128_grid_sampler_new.tflite | 0.0094 Mb | 4.1274 ms |
1x3x128x128_grid_sampler_old.tflite | 0.1407 Mb | 22.2125 ms |
4x3x32x32_grid_sampler_new.tflite | 0.0094 Mb | 1.0212 ms |
4x3x32x32_grid_sampler_old.tflite | 0.0424 Mb | 5.5643 ms |
4x3x64x64_grid_sampler_new.tflite | 0.0094 Mb | 4.1527 ms |
4x3x64x64_grid_sampler_old.tflite | 0.1407 Mb | 22.2211 ms |
4x3x128x128_grid_sampler_new.tflite | 0.0094 Mb | 17.3625 ms |
4x3x128x128_grid_sampler_old.tflite | 0.5340 Mb | 89.6066 ms |
8x3x32x32_grid_sampler_new.tflite | 0.0094 Mb | 2.0717 ms |
8x3x32x32_grid_sampler_old.tflite | 0.0752 Mb | 11.0984 ms |
8x3x64x64_grid_sampler_new.tflite | 0.0094 Mb | 8.4565 ms |
8x3x64x64_grid_sampler_old.tflite | 0.2718 Mb | 44.5216 ms |
8x3x128x128_grid_sampler_new.tflite | 0.0094 Mb | 36.0839 ms |
8x3x128x128_grid_sampler_old.tflite | 1.0583 Mb | 178.9390 ms |
Excellent. Thank you.
Right now, I am concentrating on creating my own high precision model. (Work not related to converters) I will apply it when I have more time in my private life.