segmentation_models icon indicating copy to clipboard operation
segmentation_models copied to clipboard

Choosing activation layer data type

Open phborba opened this issue 4 years ago • 2 comments

Hi, first of all I would like to thank you for your work @qubvel , the package is awesome and very easy to use.

I was trying to use tensorflow mixed_precision but I could not get it working with segmentation_models. First of all, I've tried just to enable

tf.config.optimizer.set_experimental_options(
    {"auto_mixed_precision": True}
)

and defining a LossScaleOptimizer

opt = tf.keras.optimizers.Adam(learning_rate=0.01)
opt = tf.keras.mixed_precision.experimental.LossScaleOptimizer(opt,  
                                                       "dynamic")

but the training was a mess. I've tried with the same dataset I was using before and the same amount of epochs and batch size, but it did not converge at all. So, after some reading, I've discovered that the doc of tensorflow suggest the activations should be utype='float32' when using mixed_precision.

Is it possible to add an optional parameter to choose activation type when building models? I thought like the activation default could remain the same and if the activation_type parameter was filled, then it would assign a type for the activation.

The code sinipped bellow exemplifies my suggestion and it is a suggestion for the method conv2d_bn (https://github.com/qubvel/segmentation_models/blob/94f624b7029deb463c859efbd92fa26f512b52b8/segmentation_models/backbones/inception_resnet_v2.py#L41)

def conv2d_bn(x,
              filters,
              kernel_size,
              strides=1,
              padding='same',
              activation='relu',
              activation_data_type=None,
              use_bias=False,
              name=None):
    """Utility function to apply conv + BN.
    # Arguments
        x: input tensor.
        filters: filters in `Conv2D`.
        kernel_size: kernel size as in `Conv2D`.
        strides: strides in `Conv2D`.
        padding: padding mode in `Conv2D`.
        activation: activation in `Conv2D`.
        use_bias: whether to use a bias in `Conv2D`.
        name: name of the ops; will become `name + '_ac'` for the activation
            and `name + '_bn'` for the batch norm layer.
    # Returns
        Output tensor after applying `Conv2D` and `BatchNormalization`.
    """
    x = layers.Conv2D(filters,
                      kernel_size,
                      strides=strides,
                      padding=padding,
                      use_bias=use_bias,
                      name=name)(x)
    if not use_bias:
        bn_axis = 1 if backend.image_data_format() == 'channels_first' else 3
        bn_name = None if name is None else name + '_bn'
        x = layers.BatchNormalization(axis=bn_axis,
                                      scale=False,
                                      name=bn_name)(x)
    if activation is not None:
        ac_name = None if name is None else name + '_ac'
        if activation_data_type is None:
            x = layers.Activation(activation, name=ac_name)(x)
        else:
            x = layers.Activation(activation, name=ac_name,  dtype=activation_data_type)(x)
    return x

My main concern is not to change existing use cases, so that's why I used a data type optional parameter and the behaviour of each method would only be changed if the user chooses to do so.

If you accept these suggestions, I could help you out by doing them and then making a pull request, what do you think?

phborba avatar Apr 18 '20 18:04 phborba

@qubvel would you please look into this. It's kinda important. Mixed precision training falls without this fix.

innat avatar Dec 29 '21 16:12 innat

Do you think #536 would help here?

romitjain avatar Jul 27 '22 07:07 romitjain