keras
keras copied to clipboard
Why the loss function (mse) calculated by keras not the same as mine
I want to test the loss function, mse in keras by myself. However, the calculated answers are different. The definition of mse is below: https://en.wikipedia.org/wiki/Mean_squared_error
The test code is below:
from keras.datasets import boston_housing
import numpy as np
(train_data, train_targets), (test_data, test_targets) = boston_housing.load_data()
x_train = train_data.astype(np.float32)
from keras import models
from keras import layers
model = models.Sequential()
model.add(layers.Dense(64, activation='relu', input_shape=(13,)))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(1))
model.compile(optimizer='rmsprop',loss='mse', metrics=['mae'])
y_train = train_targets.astype(np.float32)
# y_test = test_targets.astype(np.float32)
model.fit(x_train,y_train,epochs=1,batch_size=404)
print(np.mean((y_train - model.predict(x_train).ravel()) ** 2))
It shows that the loss function is around 816 in keras. However, from the definition of mse, the results is around 704. Why are the results different here?
I think the issue here is that the model output shape is [404, 1] and the label shape is [404]. The MSE produce the incorrect value since the label and prediction doesn't have the same shape. It didn't error out since the label can be broadcast to prediction's shape ([404, 404] in this case which is the probably the cause of the error).
If you add a line to expand the dim of the y_train like "y_train = numpy.expand_dim(y_train, axis=1)", the model.fit/eval and raw numpy calculation should all produce the same number.
I think this is tricky error and an easy pitfall for end user. We should either error out when there is a shape mismatch, or broadcast to same shape.
@qlzh727 current behavior is squeezing both tensors ([batch] and [batch, 1]) into the shape [batch], which is incorrectly reduced within mse.
This problem is aggravated if the user has a custom train_step and custom Mean metric wrapping mse, as there's no guarantees they will call losses_utils.squeeze_or_expand_dimensions, ultimately producing a different result than the one obtained for the loss.