keras Why the loss function (mse) calculated by keras not the same as mine

Why the loss function (mse) calculated by keras not the same as mine

Open yangzehao opened this issue 4 years ago • 3 comments

I want to test the loss function, mse in keras by myself. However, the calculated answers are different. The definition of mse is below: https://en.wikipedia.org/wiki/Mean_squared_error

The test code is below:

from keras.datasets import boston_housing
import numpy as np
(train_data, train_targets), (test_data, test_targets) = boston_housing.load_data()


x_train = train_data.astype(np.float32)

from keras import models 
from keras import layers

model = models.Sequential() 
model.add(layers.Dense(64, activation='relu', input_shape=(13,))) 
model.add(layers.Dense(64, activation='relu')) 
model.add(layers.Dense(1))
model.compile(optimizer='rmsprop',loss='mse', metrics=['mae'])

y_train = train_targets.astype(np.float32)
# y_test = test_targets.astype(np.float32)

model.fit(x_train,y_train,epochs=1,batch_size=404)

print(np.mean((y_train - model.predict(x_train).ravel()) ** 2))

It shows that the loss function is around 816 in keras. However, from the definition of mse, the results is around 704. Why are the results different here?

May 01 '21 16:05 yangzehao

I think the issue here is that the model output shape is [404, 1] and the label shape is [404]. The MSE produce the incorrect value since the label and prediction doesn't have the same shape. It didn't error out since the label can be broadcast to prediction's shape ([404, 404] in this case which is the probably the cause of the error).

If you add a line to expand the dim of the y_train like "y_train = numpy.expand_dim(y_train, axis=1)", the model.fit/eval and raw numpy calculation should all produce the same number.

Aug 20 '21 18:08 qlzh727

I think this is tricky error and an easy pitfall for end user. We should either error out when there is a shape mismatch, or broadcast to same shape.

Aug 20 '21 18:08 qlzh727

@qlzh727 current behavior is squeezing both tensors ([batch] and [batch, 1]) into the shape [batch], which is incorrectly reduced within mse. This problem is aggravated if the user has a custom train_step and custom Mean metric wrapping mse, as there's no guarantees they will call losses_utils.squeeze_or_expand_dimensions, ultimately producing a different result than the one obtained for the loss.

Jul 29 '22 01:07 lucasdavid

keras keras copied to clipboard

Why the loss function (mse) calculated by keras not the same as mine

keras
keras copied to clipboard