evidential-deep-learning icon indicating copy to clipboard operation
evidential-deep-learning copied to clipboard

Loss goes to NaN

Open markus-hinsche opened this issue 4 years ago • 2 comments

For a regression task, I am using a mid-size CNN consisting of Conv and MaxPool layers in the first layers and Dense layers in the last layers.

This is how I integrate the evidential loss (Before I used MSE loss):

optimizer = tf.keras.optimizers.Adam(learning_rate=7e-7)
def EvidentialRegressionLoss(true, pred):
    return edl.losses.EvidentialRegression(true, pred, coeff=CONFIG.EDL_COEFF)
model.compile(
    optimizer=optimizer,
    loss=EvidentialRegressionLoss,
    metrics=["mae"]
)

This is how I integrated the layer DenseNormalGamma:

    # lots of ConvLayers
    model.add(layers.Conv2D(filters=256, kernel_size=(3, 3), padding="same", activation="relu"))
    model.add(layers.Conv2D(filters=256, kernel_size=(3, 3), padding="same", activation="relu"))
    model.add(layers.MaxPooling2D(pool_size=(2, 2)))
    model.add(layers.Flatten())
    model.add(layers.Dense(1024, activation="relu"))
    model.add(layers.Dense(128, activation="relu"))

    model.add(edl.layers.DenseNormalGamma(1))  # Instead of Dense(1)

    return model

Here is the issue I am facing:

  • Before introducing evidential-deep-learning I used 0.0007=7e-4 as a learning rate that worked well.
  • Now I get loss=NaN with this learning rate, also if I make it smaller (7e-7) I get loss=NaN, mostly already in the very first epoch of training
  • If I set the learning rate ridiculously low (7e-9) I don't get NaN but of course the network is not learning fast enough

Is there any obvious mistake I make? Any thoughts and help appreciated

markus-hinsche avatar Feb 19 '21 09:02 markus-hinsche

This is maybe because of https://github.com/aamini/evidential-deep-learning/blob/7a22a2c8f35f5a2ec18fd37068b747935ff85376/evidential_deep_learning/losses/continuous.py#L35 , where the log is not safe.

wanzysky avatar Apr 15 '21 11:04 wanzysky