keras icon indicating copy to clipboard operation
keras copied to clipboard

Mask that had been removed from a tensor re-appears when building a new model with placeholders

Open dniku opened this issue 3 years ago • 7 comments

System information.

  • Have I written custom code (as opposed to using a stock example script provided in Keras): yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Colab
  • TensorFlow version (use command below): v2.8.0-0-g3f878cff5b6 2.8.0
  • Python version: 3.7.13

Describe the problem.

Please see a Colab notebook here.

I am trying to process a batch of sequences using an LSTM. These sequences all have different lengths, so I want to mask padding tokens. To do that, I pass a mask= argument to the keras.layers.LSTM call. After that, I do some postprocessing, which involves running a custom normalization layer, a simplified version of which is included in the reproducing code. This normalization layer intentionally does not support masking, since implementing it for all cases is nontrivial. Since in this particular case I apply normalization to each token separately (using TimeDistributed), I intentionally remove masking from the tensor, using a custom layer with a trivial compute_mask(). This seems to work when I build the model for the first time; however when I later try to pass new placeholders into the model, I get an exception that NormalizationLayer does not support masking — even though I seem to have removed the mask explicitly.

import tensorflow as tf

class NormalizationLayer(tf.keras.layers.Layer):
    BLOCK_TYPE_TO_CLS = {
        'batchnorm': tf.keras.layers.BatchNormalization,
        'layernorm': tf.keras.layers.LayerNormalization,
    }

    def __init__(self, block_type: str, block_kwargs, **kwargs):
        super(NormalizationLayer, self).__init__(**kwargs)

        if block_type in self.BLOCK_TYPE_TO_CLS:
            self._layer = self.BLOCK_TYPE_TO_CLS[block_type](name=f'{self.name}/{block_type}', **block_kwargs)
        else:
            raise ValueError(f'Unknown block_type: {block_type}')

    def call(self, inputs):
        return self._layer(inputs)

    def compute_output_shape(self, input_shape):
        return input_shape
    
class RemoveMaskLayer(tf.keras.layers.Layer):
    def compute_mask(self, inputs, previous_mask):
        return None

dim = 16

# Create layers
lstm = tf.keras.layers.LSTM(dim, return_sequences=True, name='lstm_layer')
remove_mask = RemoveMaskLayer(name='remove_mask_layer')
norm_layer = NormalizationLayer(block_type='layernorm', block_kwargs={'axis': -1}, name='normalization_layer')
norm_layer_td = tf.keras.layers.TimeDistributed(norm_layer, name='time_distributed_layer')

# Create model inputs
inp1 = tf.keras.Input(shape=(None, dim), dtype=tf.float32, name='sequence_input')
inp2 = tf.keras.Input(shape=(), dtype=tf.int32, name='sizes_input')

# Turn sequence sizes into mask and run LSTM
mask = tf.sequence_mask(inp2)
x = lstm(inp1, mask=mask)

# Here x has an attribute _keras_mask, which breaks NormalizationLayer, so we remove it
assert tf.is_tensor(x._keras_mask)
x = remove_mask(x)
assert not hasattr(x, '_keras_mask')

# Run normalization on each embedding at each timestep separately
x = norm_layer_td(x)

# Build model
model = tf.keras.Model(inputs=[inp1, inp2], outputs=x, name='my_model')

# Make new inputs
inp_ext_1 = tf.keras.Input(shape=(None, dim), dtype=tf.float32, name='sequence_input_external')
inp_ext_2 = tf.keras.Input(shape=(), dtype=tf.int32, name='sizes_input_external')

# Apply model to new inputs
model([inp_ext_1, inp_ext_2])

Describe the current behavior.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-541ab0358593> in <module>()
     56 
     57 # Apply model to new inputs
---> 58 model([inp_ext_1, inp_ext_2])

<...>

TypeError: Exception encountered when calling layer "my_model" (type Functional).

Layer normalization_layer does not support masking, but was passed an input_mask: Tensor("my_model/time_distributed_layer/Reshape_2:0", shape=(None,), dtype=bool)

Describe the expected behavior.

No exception.

dniku avatar Apr 20 '22 16:04 dniku

Maybe a dumb question but why not just add a compute_mask layer to the NormalizationLayer which is a no-op?

Essentially to what you did for RemoveMask inside NormalizationLayer

Why explicitly create a RemoveMask Layer

AshwinJay101 avatar Apr 21 '22 13:04 AshwinJay101

@AshwinJay101 I want to minimize the likelihood of unintentionally using normalization on a tensor with a mask. Since neither BatchNormalization nor LayerNormalization support general masks, that may introduce subtle and hard-to-catch bugs. However, in this particular case using a mask is correct, since we apply it over the dimension which is either entirely masked or entirely unmasked.

dniku avatar Apr 21 '22 14:04 dniku

@gadagashwini I was able to replicate this issue on colab using TF v2.8.0 and tf-nightly,please find the gist here.Thanks!

sushreebarsa avatar Apr 24 '22 17:04 sushreebarsa

@dniku, Hi, Thanks for reporting this issue. Does this similar to #15451. Thanks!

gadagashwini avatar Apr 28 '22 07:04 gadagashwini

@gadagashwini no, that's a different issue. Here I describe a bug somewhere in Functional Model interface. #15451 is just a feature request.

dniku avatar Apr 28 '22 08:04 dniku

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler[bot] avatar May 05 '22 08:05 google-ml-butler[bot]

Still a problem.

dniku avatar May 05 '22 10:05 dniku