onnx2tf icon indicating copy to clipboard operation
onnx2tf copied to clipboard

Improved grid_sample

Open AlexanderLutsenko opened this issue 1 year ago • 1 comments

Issue Type

Others

OS

Linux

onnx2tf version number

1.18.14

onnx version number

onnxruntime version number

onnxsim (onnx_simplifier) version number

tensorflow version number

2.14

Download URL for ONNX

Parameter Replacement JSON

-

Description

TL;DR: I believe grid_sample is being converted incorrectly, plus it can be made much smaller in size and ~5x faster.

So I went on a search for a better Tensorflow substitute of grid_sample, found some interesting stuff here: https://github.com/PINTO0309/onnx2tf/issues/426

The bug

The problem occurs with padding_mode='zero' when pixel index goes out of image bounds by less than a whole pixel. Consider this one-dimensional example:

Let x = -0.4 O[x] = I[x0]*0.6 + I[x1]*0.4, where x0 = -1, x1 = 0, and I[-1] = 0 as an out-of-bounds pixel. Instead, the current code sets the entire O[x] to 0.

I think the best method to do this right is to zero-pad input image by one pixel to each side and add 1 to all pixel indices. The expensive post-processing phase becomes unnecessary.

Broken TFLiteConverter

The way Tensorflow converts gather_nd to TFLite is completely broken. Not only is it offensively slow, it also adds this suspicious Concatenation op with a massive tensor of zeros inside.

cat

But 1D gather seems to be alright, so that's what I ended up using.

def gather(input, y, x, b, h, w, c, padding_mode):
    # Slow!
    # return tf.gather_nd(params=input, indices=tf.cast(tf.concat([y, x], axis=-1), dtype=tf.int32), batch_dims=1)

    if padding_mode == 'zeros':
        w_padded = w + 2
        h_padded = h + 2
        linear_coordinates = tf.cast(y * w_padded + x, dtype=tf.int32)
        linear_coordinates = tf.reshape(linear_coordinates, shape=(b, h, w))
        input = tf.reshape(input, shape=(b, h_padded * w_padded, c))
    else:
        linear_coordinates = tf.cast(y * w + x, dtype=tf.int32)
        linear_coordinates = tf.reshape(linear_coordinates, shape=(b, h, w))
        input = tf.reshape(input, shape=(b, h * w, c))

    out = tf.gather(params=input, indices=linear_coordinates, batch_dims=1)
    return out

Full code: https://github.com/AlexanderLutsenko/nobuco/blob/aa4745e6abb1124d90f7d3ace6d282f923f08a40/nobuco/node_converters/grid_sampling.py#L38

Correctness tests: https://github.com/AlexanderLutsenko/nobuco/blob/aa4745e6abb1124d90f7d3ace6d282f923f08a40/examples/grid_samplers.py

Benchmark results, Snapdragon 662:

name size XNNPACK avg
1x3x32x32_grid_sampler_new.tflite 0.0092 Mb 0.2618 ms
1x3x32x32_grid_sampler_old.tflite 0.0177 Mb 1.3888 ms
1x3x64x64_grid_sampler_new.tflite 0.0093 Mb 1.0207 ms
1x3x64x64_grid_sampler_old.tflite 0.0424 Mb 5.5717 ms
1x3x128x128_grid_sampler_new.tflite 0.0094 Mb 4.1274 ms
1x3x128x128_grid_sampler_old.tflite 0.1407 Mb 22.2125 ms
4x3x32x32_grid_sampler_new.tflite 0.0094 Mb 1.0212 ms
4x3x32x32_grid_sampler_old.tflite 0.0424 Mb 5.5643 ms
4x3x64x64_grid_sampler_new.tflite 0.0094 Mb 4.1527 ms
4x3x64x64_grid_sampler_old.tflite 0.1407 Mb 22.2211 ms
4x3x128x128_grid_sampler_new.tflite 0.0094 Mb 17.3625 ms
4x3x128x128_grid_sampler_old.tflite 0.5340 Mb 89.6066 ms
8x3x32x32_grid_sampler_new.tflite 0.0094 Mb 2.0717 ms
8x3x32x32_grid_sampler_old.tflite 0.0752 Mb 11.0984 ms
8x3x64x64_grid_sampler_new.tflite 0.0094 Mb 8.4565 ms
8x3x64x64_grid_sampler_old.tflite 0.2718 Mb 44.5216 ms
8x3x128x128_grid_sampler_new.tflite 0.0094 Mb 36.0839 ms
8x3x128x128_grid_sampler_old.tflite 1.0583 Mb 178.9390 ms

AlexanderLutsenko avatar Nov 20 '23 20:11 AlexanderLutsenko

Excellent. Thank you.

Right now, I am concentrating on creating my own high precision model. (Work not related to converters) I will apply it when I have more time in my private life.

PINTO0309 avatar Nov 20 '23 22:11 PINTO0309