Description

This PR adds a test diagnosing the issue reported in https://github.com/tensorflow/addons/issues/2745. This test is currently expected to fail.

Type of change

[ ] Bug fix
[ ] New Tutorial
[ ] Updated or additional documentation
[x] Additional Testing
[ ] New Activation and the changes conform to the activation contribution guidelines
[ ] New Callback and the changes conform to the callback contribution guidelines
[ ] New Image addition and the changes conform to the image op contribution guidelines
[ ] New Layer and the changes conform to the layer contribution guidelines
[ ] New Loss and the changes conform to the loss contribution guidelines
[ ] New Metric and the changes conform to the metric contribution guidelines
[ ] New Optimizer and the changes conform to the optimizer contribution guidelines
[ ] New RNN Cell and the changes conform to the rnn contribution guidelines
[ ] New Seq2seq addition and the changes conform to the seq2seq contribution guidelines
[ ] New Text addition and the changes conform to the text op contribution guidelines

Checklist:

[x] I've properly formatted my code according to the guidelines
- [x] By running Black + Flake8
- [ ] By running pre-commit hooks
[ ] This PR addresses an already submitted issue for TensorFlow Addons
[ ] I have made corresponding changes to the documentation
[ ] I have added tests that prove my fix is effective or that my feature works
[ ] This PR contains modifications to C++ custom-ops

How Has This Been Tested?

This PR adds a new test without modifying any existing functionality.

Aug 12 '22 12:08 wsmigaj

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Aug 12 '22 12:08 google-cla[bot]

@smokrow

You are owner of some files modified in this pull request. Would you kindly review the changes whenever you have the time to? Thank you very much.

Aug 12 '22 12:08 bot-of-gabrieldemarmiesse

Can you commit the changes to fix this test?

Aug 12 '22 15:08 bhack

Can you commit the changes to fix this test?

Yes, of course. Here you go.

Aug 12 '22 19:08 wsmigaj

Unfortunately the batch-size-insensitive reimplementation of _apply_normalization() proposed in commit eadca79 contained some bugs; in particular, in the line

abs_axis = (self.axis + group_rank) % group_rank

converting the possibly negative self.axis index to a non-negative index, the rank of the original input tensor should have been used instead of group_rank, the rank of the reshaped tensor. This bug caused the reductions to be performed along incorrect axes.

As a result, I've added some extra commits to this branch. Since the bug mentioned above was not picked up by the existing unit tests, and in particular by test_feature_input and test_picture_input, which look like they're meant to compare the output of the GroupNormalization layer against that produced by a reference NumPy-based implementation, I've amended these tests in commits 03c8977 and 41f179b. Instead of comparing solely output tensor means, they now compare all elements of the output tensors individually. In commit 5d847f3 I've fixed the bugs to make these tests pass, and in 0e32ace added an extra optimisation, skipping the input tensor transpose when it's not needed. Finally, in commit 9be3d71, I've extended the tests verifying insensitivity to batch size to cover also reductions along other axes than the last and the special cases of instance and layer normalization.

I've also done some benchmarking to compare the cost of the proposed implementation against the original one. The system used for benchmarking was equipped with an Intel i7-10700 processor and a Nvidia RTX 2070 Super graphics card and was running Windows. I used the following script to measure the time taken by the group normalization for a sample of about a hundred 4D tensor shapes, group numbers and group axis indices:

import tensorflow as tf
from tensorflow_addons.layers.normalizations import GroupNormalization
import random, timeit, math

n_samples = 100
n_repetitions = 100

axis_choices = [1, 2, 3]
n_channels_choices = [10, 20, 40, 80, 160]
n_groups_choices = [-1, 1, 2, 5, 10]
batch_size_choices = [1, 2, 4, 8, 16]
height_width_choices = [40, 80, 160, 320, 640, 1280]

random.seed(1234)

print("axis n_groups batch_size height width n_channels time")

for axis, n_channels, n_groups, batch_size, height, width in zip(
    random.choices(axis_choices, k=n_samples),
    random.choices(n_channels_choices, k=n_samples),
    random.choices(n_groups_choices, k=n_samples),
    random.choices(batch_size_choices, k=n_samples),
    random.choices(height_width_choices, k=n_samples),
    random.choices(height_width_choices, k=n_samples)):
    
    shape_spec = [None, None, None]
    shape_spec.insert(axis, n_channels)

    @tf.function(input_signature=[tf.TensorSpec(shape=shape_spec, dtype=tf.float32)])
    def group_normalization(x):
        for i in tf.range(n_repetitions):
            x = gn(x)
        return x
            
    gn = GroupNormalization(axis=axis, groups=n_groups)
    
    input_shape = [batch_size, height, width]    
    input_shape.insert(axis, n_channels)
    if math.prod(input_shape) > 500000000:
        continue  # too large tensor
    x = tf.zeros(input_shape)
    time = min(timeit.repeat('group_normalization(x)', repeat=5, number=1, globals=locals()))
    print(axis, n_groups, batch_size, height, width, n_channels, time)

In general, the proposed changes have a neutral or positive effect on performance, at least in the common cases where grouping occurs along axis 1 or 3 (corresponding to the NCHW and NHWC layouts, respectively). The following table shows the geometric means of speed-up ratios taken over the samples with the axis parameter set to either 1, 2 or 3 and over the whole collection of samples, with and without the GPU:

Axis	CPU (n_repetitions = 10)	GPU (n_repetitions = 100)
1	1.02	0.99
2	0.69	1.12
3	0.80	0.89
1, 2 or 3	0.83	1.00

A clear performance degradation is only observed for group normalization along axis 2 (the penultimate axis) on a GPU; I believe this is a case that doesn't often occur in practice. For the common case of group normalization along the last axis, the changes improve performance both on the CPU and the GPU.

Sep 21 '22 09:09 wsmigaj

GroupNormalization layer is available in the core library. Check it out here. We suggest using that. Anyway we would have deprecated this layer in TF-addons once the corresponding layer has been introduced in the core. Hence I am closing this PR for now

Mar 14 '23 17:03 AakashKumarNain

addons
addons copied to clipboard

Add a test verifying independence of group normalization results from batch size

Description

Type of change

Checklist:

How Has This Been Tested?

addons addons copied to clipboard

Add a test verifying independence of group normalization results from batch size

Description

Type of change

Checklist:

How Has This Been Tested?

addons
addons copied to clipboard