evidential-deep-learning
evidential-deep-learning copied to clipboard
Loss goes to NaN
For a regression task, I am using a mid-size CNN consisting of Conv and MaxPool layers in the first layers and Dense layers in the last layers.
This is how I integrate the evidential loss (Before I used MSE loss):
optimizer = tf.keras.optimizers.Adam(learning_rate=7e-7)
def EvidentialRegressionLoss(true, pred):
return edl.losses.EvidentialRegression(true, pred, coeff=CONFIG.EDL_COEFF)
model.compile(
optimizer=optimizer,
loss=EvidentialRegressionLoss,
metrics=["mae"]
)
This is how I integrated the layer DenseNormalGamma:
# lots of ConvLayers
model.add(layers.Conv2D(filters=256, kernel_size=(3, 3), padding="same", activation="relu"))
model.add(layers.Conv2D(filters=256, kernel_size=(3, 3), padding="same", activation="relu"))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(1024, activation="relu"))
model.add(layers.Dense(128, activation="relu"))
model.add(edl.layers.DenseNormalGamma(1)) # Instead of Dense(1)
return model
Here is the issue I am facing:
- Before introducing evidential-deep-learning I used
0.0007=7e-4
as a learning rate that worked well. - Now I get loss=NaN with this learning rate, also if I make it smaller (
7e-7
) I get loss=NaN, mostly already in the very first epoch of training - If I set the learning rate ridiculously low (
7e-9
) I don't get NaN but of course the network is not learning fast enough
Is there any obvious mistake I make? Any thoughts and help appreciated
This is maybe because of https://github.com/aamini/evidential-deep-learning/blob/7a22a2c8f35f5a2ec18fd37068b747935ff85376/evidential_deep_learning/losses/continuous.py#L35 , where the log is not safe.