addons
addons copied to clipboard
Add a test verifying independence of group normalization results from batch size
Description
This PR adds a test diagnosing the issue reported in https://github.com/tensorflow/addons/issues/2745. This test is currently expected to fail.
Type of change
- [ ] Bug fix
- [ ] New Tutorial
- [ ] Updated or additional documentation
- [x] Additional Testing
- [ ] New Activation and the changes conform to the activation contribution guidelines
- [ ] New Callback and the changes conform to the callback contribution guidelines
- [ ] New Image addition and the changes conform to the image op contribution guidelines
- [ ] New Layer and the changes conform to the layer contribution guidelines
- [ ] New Loss and the changes conform to the loss contribution guidelines
- [ ] New Metric and the changes conform to the metric contribution guidelines
- [ ] New Optimizer and the changes conform to the optimizer contribution guidelines
- [ ] New RNN Cell and the changes conform to the rnn contribution guidelines
- [ ] New Seq2seq addition and the changes conform to the seq2seq contribution guidelines
- [ ] New Text addition and the changes conform to the text op contribution guidelines
Checklist:
- [x] I've properly formatted my code according to the guidelines
- [x] By running Black + Flake8
- [ ] By running pre-commit hooks
- [ ] This PR addresses an already submitted issue for TensorFlow Addons
- [ ] I have made corresponding changes to the documentation
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] This PR contains modifications to C++ custom-ops
How Has This Been Tested?
This PR adds a new test without modifying any existing functionality.
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).
View this failed invocation of the CLA check for more information.
For the most up to date status, view the checks section at the bottom of the pull request.
@smokrow
You are owner of some files modified in this pull request. Would you kindly review the changes whenever you have the time to? Thank you very much.
Can you commit the changes to fix this test?
Can you commit the changes to fix this test?
Yes, of course. Here you go.
Unfortunately the batch-size-insensitive reimplementation of _apply_normalization()
proposed in commit eadca79 contained some bugs; in particular, in the line
abs_axis = (self.axis + group_rank) % group_rank
converting the possibly negative self.axis
index to a non-negative index, the rank of the original input tensor should have been used instead of group_rank
, the rank of the reshaped tensor. This bug caused the reductions to be performed along incorrect axes.
As a result, I've added some extra commits to this branch. Since the bug mentioned above was not picked up by the existing unit tests, and in particular by test_feature_input
and test_picture_input
, which look like they're meant to compare the output of the GroupNormalization
layer against that produced by a reference NumPy-based implementation, I've amended these tests in commits 03c8977 and 41f179b. Instead of comparing solely output tensor means, they now compare all elements of the output tensors individually. In commit 5d847f3 I've fixed the bugs to make these tests pass, and in 0e32ace added an extra optimisation, skipping the input tensor transpose when it's not needed. Finally, in commit 9be3d71, I've extended the tests verifying insensitivity to batch size to cover also reductions along other axes than the last and the special cases of instance and layer normalization.
I've also done some benchmarking to compare the cost of the proposed implementation against the original one. The system used for benchmarking was equipped with an Intel i7-10700 processor and a Nvidia RTX 2070 Super graphics card and was running Windows. I used the following script to measure the time taken by the group normalization for a sample of about a hundred 4D tensor shapes, group numbers and group axis indices:
import tensorflow as tf
from tensorflow_addons.layers.normalizations import GroupNormalization
import random, timeit, math
n_samples = 100
n_repetitions = 100
axis_choices = [1, 2, 3]
n_channels_choices = [10, 20, 40, 80, 160]
n_groups_choices = [-1, 1, 2, 5, 10]
batch_size_choices = [1, 2, 4, 8, 16]
height_width_choices = [40, 80, 160, 320, 640, 1280]
random.seed(1234)
print("axis n_groups batch_size height width n_channels time")
for axis, n_channels, n_groups, batch_size, height, width in zip(
random.choices(axis_choices, k=n_samples),
random.choices(n_channels_choices, k=n_samples),
random.choices(n_groups_choices, k=n_samples),
random.choices(batch_size_choices, k=n_samples),
random.choices(height_width_choices, k=n_samples),
random.choices(height_width_choices, k=n_samples)):
shape_spec = [None, None, None]
shape_spec.insert(axis, n_channels)
@tf.function(input_signature=[tf.TensorSpec(shape=shape_spec, dtype=tf.float32)])
def group_normalization(x):
for i in tf.range(n_repetitions):
x = gn(x)
return x
gn = GroupNormalization(axis=axis, groups=n_groups)
input_shape = [batch_size, height, width]
input_shape.insert(axis, n_channels)
if math.prod(input_shape) > 500000000:
continue # too large tensor
x = tf.zeros(input_shape)
time = min(timeit.repeat('group_normalization(x)', repeat=5, number=1, globals=locals()))
print(axis, n_groups, batch_size, height, width, n_channels, time)
In general, the proposed changes have a neutral or positive effect on performance, at least in the common cases where grouping occurs along axis 1 or 3 (corresponding to the NCHW and NHWC layouts, respectively). The following table shows the geometric means of speed-up ratios taken over the samples with the axis
parameter set to either 1, 2 or 3 and over the whole collection of samples, with and without the GPU:
Axis | CPU (n_repetitions = 10) | GPU (n_repetitions = 100) |
---|---|---|
1 | 1.02 | 0.99 |
2 | 0.69 | 1.12 |
3 | 0.80 | 0.89 |
1, 2 or 3 | 0.83 | 1.00 |
A clear performance degradation is only observed for group normalization along axis 2 (the penultimate axis) on a GPU; I believe this is a case that doesn't often occur in practice. For the common case of group normalization along the last axis, the changes improve performance both on the CPU and the GPU.
GroupNormalization layer is available in the core library. Check it out here. We suggest using that. Anyway we would have deprecated this layer in TF-addons once the corresponding layer has been introduced in the core. Hence I am closing this PR for now