handson-ml3
handson-ml3 copied to clipboard
[BUG] Chapter 13: Solution to Exercise 10 might have an issue (paragraph e)
I am completely new to the material, so could be wrong, but I think the solution of exercise 10 has an issue. The solution presents the following function:
def compute_mean_embedding(inputs):
not_pad = tf.math.count_nonzero(inputs, axis=-1)
n_words = tf.math.count_nonzero(not_pad, axis=-1, keepdims=True)
sqrt_n_words = tf.math.sqrt(tf.cast(n_words, tf.float32))
return tf.reduce_sum(inputs, axis=1) / sqrt_n_words
This function is supposed to calculate the square root of the number of words by counting non-zero vectors. However, it is plugged into the model after the Embeddings layer, like so:
embedding_size = 20
tf.random.set_seed(42)
text_vectorization = tf.keras.layers.TextVectorization(
max_tokens=max_tokens, output_mode="int")
text_vectorization.adapt(sample_reviews)
model = tf.keras.Sequential([
text_vectorization,
tf.keras.layers.Embedding(input_dim=max_tokens,
output_dim=embedding_size,
mask_zero=True), # <pad> tokens => zero vectors
tf.keras.layers.Lambda(compute_mean_embedding),
tf.keras.layers.Dense(100, activation="relu"),
tf.keras.layers.Dense(1, activation="sigmoid"),
])
From my tests, it does not seem that mask_zero=True
guarantees that the padding token 0 (output of the TextVectorization
layer) is embedded to a zeros vector by the Embedding
layer. So in fact, I'm not sure it correctly counts the number of words. If I am correct, it always uses the square root of the length of the longest sentence in the batch.
Here is my suggested solution. The layer should replace both the Embedding
and Lambda
layers in the notebook's solution. Note, however, that when I tested a model using this layer, it did not have a better accuracy than the notebook's solution for the same dataset and hyperparameters.
class SentenceEmbedding(tf.keras.layers.Layer):
def __init__(self, input_dim, output_dim, **kwargs):
super().__init__(**kwargs)
self.input_dim = input_dim
self.output_dim = output_dim
self.embedding_layer = tf.keras.layers.Embedding(
input_dim=self.input_dim,
output_dim=self.output_dim,
mask_zero=True
)
def call(self, input):
n_words = tf.math.count_nonzero(input, axis=-1, keepdims=True, dtype=tf.float32)
embeddings_output = self.embedding_layer(input)
return tf.reduce_sum(embeddings_output, axis=1) / tf.sqrt(n_words)
def get_config(self):
base_config = super().get_config()
return {
**base_config,
'input_dim': self.input_dim,
'output_dim': self.output_dim
}