tf-keras icon indicating copy to clipboard operation
tf-keras copied to clipboard

Why the loss function (mse) calculated by keras not the same as mine

Open yangzehao opened this issue 4 years ago • 3 comments
trafficstars

I want to test the loss function, mse in keras by myself. However, the calculated answers are different. The definition of mse is below: https://en.wikipedia.org/wiki/Mean_squared_error

The test code is below:

from keras.datasets import boston_housing
import numpy as np
(train_data, train_targets), (test_data, test_targets) = boston_housing.load_data()


x_train = train_data.astype(np.float32)

from keras import models 
from keras import layers

model = models.Sequential() 
model.add(layers.Dense(64, activation='relu', input_shape=(13,))) 
model.add(layers.Dense(64, activation='relu')) 
model.add(layers.Dense(1))
model.compile(optimizer='rmsprop',loss='mse', metrics=['mae'])

y_train = train_targets.astype(np.float32)
# y_test = test_targets.astype(np.float32)

model.fit(x_train,y_train,epochs=1,batch_size=404)

print(np.mean((y_train - model.predict(x_train).ravel()) ** 2))

It shows that the loss function is around 816 in keras. However, from the definition of mse, the results is around 704. Why are the results different here?

yangzehao avatar May 01 '21 16:05 yangzehao

I think the issue here is that the model output shape is [404, 1] and the label shape is [404]. The MSE produce the incorrect value since the label and prediction doesn't have the same shape. It didn't error out since the label can be broadcast to prediction's shape ([404, 404] in this case which is the probably the cause of the error).

If you add a line to expand the dim of the y_train like "y_train = numpy.expand_dim(y_train, axis=1)", the model.fit/eval and raw numpy calculation should all produce the same number.

qlzh727 avatar Aug 20 '21 18:08 qlzh727

I think this is tricky error and an easy pitfall for end user. We should either error out when there is a shape mismatch, or broadcast to same shape.

qlzh727 avatar Aug 20 '21 18:08 qlzh727

@qlzh727 current behavior is squeezing both tensors ([batch] and [batch, 1]) into the shape [batch], which is incorrectly reduced within mse. This problem is aggravated if the user has a custom train_step and custom Mean metric wrapping mse, as there's no guarantees they will call losses_utils.squeeze_or_expand_dimensions, ultimately producing a different result than the one obtained for the loss.

lucasdavid avatar Jul 29 '22 01:07 lucasdavid

@lucasdavid, The loss function mse has been reduced to 525 in Keras with the latest tensorflow and the keras. Kindly find the gist of it here. Thank you!

tilakrayal avatar Apr 17 '25 04:04 tilakrayal

This issue is stale because it has been open for 14 days with no activity. It will be closed if no further activity occurs. Thank you.

github-actions[bot] avatar May 05 '25 02:05 github-actions[bot]

This issue was closed because it has been inactive for 28 days. Please reopen if you'd like to work on this further.

github-actions[bot] avatar May 19 '25 02:05 github-actions[bot]