keras icon indicating copy to clipboard operation
keras copied to clipboard

ConvNeXt not compatible with mixed precision

Open andreped opened this issue 3 years ago • 5 comments

System information.

  • Have I written custom code (as opposed to using a stock example script provided in Keras): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04
  • TensorFlow installed from (source or binary): nightly
  • TensorFlow version (use command below): 2.11.0a20220816
  • Python version: 3.8.10
  • GPU model and memory: NVIDIA Quadro RTX 6000 (24 GB VRAM)
  • Do you want to contribute a PR? (yes/no): No

Describe the problem. Starting from this issue, I observed that ConvNeXt was not compatible with TimeDistributed, this was then fixed in the nightly release (see here). As it was working I then tried to use mixed precision, where I got a new error. Note that MobileNetV3 works seemlessly with mixed precision. Hence, I think only ConvNeXt might be affected, but not sure.

I believe the model itself is working fine with mixed precision, but it contains the layer LayerScale, which may not be (see logs below for more details).

Describe the expected behavior. Mixed precision should work seemlessly with ConvNeXt.

Standalone code to reproduce the issue. It failed when initializing the ConvNeXt model, after mixed precision was enabled. Hence, I believe running this might reproduce the issue (note that source logs are not directly from this script, but I believe you will get the same error):

import tensorflow as tf
from tensorflow.keras.applications import ConvNeXtSmall

tf.keras.mixed_precision.set_global_policy('mixed_float16')
model = ConvNeXtSmall(include_top=False, weights="imagenet", pooling="none")

Source code / logs.

Traceback (most recent call last):
  File "source/main.py", line 454, in <module>
    main()
  File "source/main.py", line 216, in main
    model = get_classifier_architecture(MODEL_ARCH=ret.arch, ret=ret, instance_size=instance_size,
  File "/home/andrep/workspace/bcgrade/source/models/classifiers.py", line 371, in get_classifier_architecture
    shared_base_model = ConvNeXtSmall(include_top=False, weights="imagenet", pooling="none", input_shape=instance_size[1:])
  File "/home/andrep/workspace/bcgrade/venv/lib/python3.8/site-packages/keras/applications/convnext.py", line 610, in ConvNeXtSmall
    return ConvNeXt(
  File "/home/andrep/workspace/bcgrade/venv/lib/python3.8/site-packages/keras/applications/convnext.py", line 516, in ConvNeXt
    x = ConvNeXtBlock(
  File "/home/andrep/workspace/bcgrade/venv/lib/python3.8/site-packages/keras/applications/convnext.py", line 283, in apply
    x = LayerScale(
  File "/home/andrep/workspace/bcgrade/venv/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/andrep/workspace/bcgrade/venv/lib/python3.8/site-packages/tensorflow/python/framework/op_def_library.py", line 588, in _ExtractInputsAndAttrs
    raise TypeError(
TypeError: Exception encountered when calling layer "convnext_small_stage_0_block_0_layer_scale" (type LayerScale).

Input 'y' of 'Mul' Op has type float32 that does not match type float16 of argument 'x'.

Call arguments received by layer "convnext_small_stage_0_block_0_layer_scale" (type LayerScale):
  • x=tf.Tensor(shape=(None, None, None, 96), dtype=float16)

andreped avatar Aug 17 '22 11:08 andreped

@gowthamkpr, I was able to reproduce the issue on tensorflow v2.8, v2.9 and nightly. Kindly find the gist of it here.

tilakrayal avatar Aug 18 '22 12:08 tilakrayal

I was able to fix this by adding in casts to the appropriate dtype for the the build and __call__ functions (lines 219-225) in the custom layer LayerScale as follows:

class LayerScale(layers.Layer):
    """Layer scale module.

    References:
      - https://arxiv.org/abs/2103.17239

    Args:
      init_values (float): Initial value for layer scale. Should be within
        [0, 1].
      projection_dim (int): Projection dimensionality.

    Returns:
      Tensor multiplied to the scale.
    """

    def __init__(self, init_values, projection_dim, **kwargs):
        super().__init__(**kwargs)
        self.init_values = init_values
        self.projection_dim = projection_dim

    def build(self, input_shape):
        self.gamma = tf.Variable(
            self.init_values * tf.ones((self.projection_dim,))
        )
        if self.gamma.dtype.base_dtype != self._compute_dtype_object.base_dtype:
            self.gamma = tf.cast(self.gamma, dtype=self._compute_dtype_object)        

    def call(self, x):
        if x.dtype.base_dtype != self._compute_dtype_object.base_dtype:
            x = tf.cast(x, dtype=self._compute_dtype_object)
        return x * self.gamma

    def get_config(self):
        config = super().get_config()
        config.update(
            {
                "init_values": self.init_values,
                "projection_dim": self.projection_dim,
            }
        )
        return config

zibbini avatar Oct 04 '22 16:10 zibbini

I was able to fix this by adding in casts to the appropriate dtype for the the build and __call__ functions

Interesting, @zibbini. LGTM. Can run some checks tomorrow.

andreped avatar Oct 04 '22 17:10 andreped

@zibbini Just ran some experiments, but ran into more issues. See gist.

After adding your modifications, it seems to work, but it fails to load the pretrained weights.

andreped avatar Oct 07 '22 12:10 andreped

@andreped I managed to fix those issues with pre-trained weights by initialising the gamma variable in LayerScale with the correct dtype rather than adding in the casts (see gist):

class LayerScale(layers.Layer):
    """Layer scale module.

    References:
      - https://arxiv.org/abs/2103.17239

    Args:
      init_values (float): Initial value for layer scale. Should be within
        [0, 1].
      projection_dim (int): Projection dimensionality.

    Returns:
      Tensor multiplied to the scale.
    """

    def __init__(self, init_values, projection_dim, **kwargs):
        super().__init__(**kwargs)
        self.init_values = init_values
        self.projection_dim = projection_dim

    def build(self, input_shape):
        self.gamma = tf.Variable(
            self.init_values * tf.ones((self.projection_dim,)), dtype=self._compute_dtype_object
        )

    def call(self, x):
        return x * self.gamma

    def get_config(self):
        config = super().get_config()
        config.update(
            {
                "init_values": self.init_values,
                "projection_dim": self.projection_dim,
            }
        )
        return config

zibbini avatar Oct 15 '22 09:10 zibbini