tf-keras
tf-keras copied to clipboard
Why the loss function (mse) calculated by keras not the same as mine
I want to test the loss function, mse in keras by myself. However, the calculated answers are different. The definition of mse is below: https://en.wikipedia.org/wiki/Mean_squared_error
The test code is below:
from keras.datasets import boston_housing
import numpy as np
(train_data, train_targets), (test_data, test_targets) = boston_housing.load_data()
x_train = train_data.astype(np.float32)
from keras import models
from keras import layers
model = models.Sequential()
model.add(layers.Dense(64, activation='relu', input_shape=(13,)))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(1))
model.compile(optimizer='rmsprop',loss='mse', metrics=['mae'])
y_train = train_targets.astype(np.float32)
# y_test = test_targets.astype(np.float32)
model.fit(x_train,y_train,epochs=1,batch_size=404)
print(np.mean((y_train - model.predict(x_train).ravel()) ** 2))
It shows that the loss function is around 816 in keras. However, from the definition of mse, the results is around 704. Why are the results different here?
I think the issue here is that the model output shape is [404, 1] and the label shape is [404]. The MSE produce the incorrect value since the label and prediction doesn't have the same shape. It didn't error out since the label can be broadcast to prediction's shape ([404, 404] in this case which is the probably the cause of the error).
If you add a line to expand the dim of the y_train like "y_train = numpy.expand_dim(y_train, axis=1)", the model.fit/eval and raw numpy calculation should all produce the same number.
I think this is tricky error and an easy pitfall for end user. We should either error out when there is a shape mismatch, or broadcast to same shape.
@qlzh727 current behavior is squeezing both tensors ([batch] and [batch, 1]) into the shape [batch], which is incorrectly reduced within mse.
This problem is aggravated if the user has a custom train_step and custom Mean metric wrapping mse, as there's no guarantees they will call losses_utils.squeeze_or_expand_dimensions, ultimately producing a different result than the one obtained for the loss.
@lucasdavid, The loss function mse has been reduced to 525 in Keras with the latest tensorflow and the keras. Kindly find the gist of it here. Thank you!
This issue is stale because it has been open for 14 days with no activity. It will be closed if no further activity occurs. Thank you.
This issue was closed because it has been inactive for 28 days. Please reopen if you'd like to work on this further.