keras
keras copied to clipboard
Mask that had been removed from a tensor re-appears when building a new model with placeholders
System information.
- Have I written custom code (as opposed to using a stock example script provided in Keras): yes
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Colab
- TensorFlow version (use command below):
v2.8.0-0-g3f878cff5b6 2.8.0 - Python version: 3.7.13
Describe the problem.
Please see a Colab notebook here.
I am trying to process a batch of sequences using an LSTM. These sequences all have different lengths, so I want to mask padding tokens. To do that, I pass a mask= argument to the keras.layers.LSTM call. After that, I do some postprocessing, which involves running a custom normalization layer, a simplified version of which is included in the reproducing code. This normalization layer intentionally does not support masking, since implementing it for all cases is nontrivial. Since in this particular case I apply normalization to each token separately (using TimeDistributed), I intentionally remove masking from the tensor, using a custom layer with a trivial compute_mask(). This seems to work when I build the model for the first time; however when I later try to pass new placeholders into the model, I get an exception that NormalizationLayer does not support masking — even though I seem to have removed the mask explicitly.
import tensorflow as tf
class NormalizationLayer(tf.keras.layers.Layer):
BLOCK_TYPE_TO_CLS = {
'batchnorm': tf.keras.layers.BatchNormalization,
'layernorm': tf.keras.layers.LayerNormalization,
}
def __init__(self, block_type: str, block_kwargs, **kwargs):
super(NormalizationLayer, self).__init__(**kwargs)
if block_type in self.BLOCK_TYPE_TO_CLS:
self._layer = self.BLOCK_TYPE_TO_CLS[block_type](name=f'{self.name}/{block_type}', **block_kwargs)
else:
raise ValueError(f'Unknown block_type: {block_type}')
def call(self, inputs):
return self._layer(inputs)
def compute_output_shape(self, input_shape):
return input_shape
class RemoveMaskLayer(tf.keras.layers.Layer):
def compute_mask(self, inputs, previous_mask):
return None
dim = 16
# Create layers
lstm = tf.keras.layers.LSTM(dim, return_sequences=True, name='lstm_layer')
remove_mask = RemoveMaskLayer(name='remove_mask_layer')
norm_layer = NormalizationLayer(block_type='layernorm', block_kwargs={'axis': -1}, name='normalization_layer')
norm_layer_td = tf.keras.layers.TimeDistributed(norm_layer, name='time_distributed_layer')
# Create model inputs
inp1 = tf.keras.Input(shape=(None, dim), dtype=tf.float32, name='sequence_input')
inp2 = tf.keras.Input(shape=(), dtype=tf.int32, name='sizes_input')
# Turn sequence sizes into mask and run LSTM
mask = tf.sequence_mask(inp2)
x = lstm(inp1, mask=mask)
# Here x has an attribute _keras_mask, which breaks NormalizationLayer, so we remove it
assert tf.is_tensor(x._keras_mask)
x = remove_mask(x)
assert not hasattr(x, '_keras_mask')
# Run normalization on each embedding at each timestep separately
x = norm_layer_td(x)
# Build model
model = tf.keras.Model(inputs=[inp1, inp2], outputs=x, name='my_model')
# Make new inputs
inp_ext_1 = tf.keras.Input(shape=(None, dim), dtype=tf.float32, name='sequence_input_external')
inp_ext_2 = tf.keras.Input(shape=(), dtype=tf.int32, name='sizes_input_external')
# Apply model to new inputs
model([inp_ext_1, inp_ext_2])
Describe the current behavior.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-2-541ab0358593> in <module>()
56
57 # Apply model to new inputs
---> 58 model([inp_ext_1, inp_ext_2])
<...>
TypeError: Exception encountered when calling layer "my_model" (type Functional).
Layer normalization_layer does not support masking, but was passed an input_mask: Tensor("my_model/time_distributed_layer/Reshape_2:0", shape=(None,), dtype=bool)
Describe the expected behavior.
No exception.
Maybe a dumb question but why not just add a compute_mask layer to the NormalizationLayer which is a no-op?
Essentially to what you did for RemoveMask inside NormalizationLayer
Why explicitly create a RemoveMask Layer
@AshwinJay101 I want to minimize the likelihood of unintentionally using normalization on a tensor with a mask. Since neither BatchNormalization nor LayerNormalization support general masks, that may introduce subtle and hard-to-catch bugs. However, in this particular case using a mask is correct, since we apply it over the dimension which is either entirely masked or entirely unmasked.
@gadagashwini I was able to replicate this issue on colab using TF v2.8.0 and tf-nightly,please find the gist here.Thanks!
@dniku, Hi, Thanks for reporting this issue. Does this similar to #15451. Thanks!
@gadagashwini no, that's a different issue. Here I describe a bug somewhere in Functional Model interface. #15451 is just a feature request.
This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.
Still a problem.