focal-tversky-unet icon indicating copy to clipboard operation
focal-tversky-unet copied to clipboard

pred1 output nan after a few epochs

Open hwaxxer opened this issue 3 years ago • 0 comments

Hi and thanks for open sourcing the code. I have been testing out proposed model attn_reg on images with sizes 128x1024x1. Everything runs well until after a few epochs there's an exception caused by pred1 outputting nan:

Invalid argument:  assertion failed: [predictions must be >= 0] [Condition x >= y did not hold element-wise:] [x (functional_1/pred1/Sigmoid:0) = ] [[[[nan][nan][nan]]]...] [y (Cast_8/x:0) = ] [0]

I believe this happens in a tf.keras.metrics callback. Learning seems to converge but training always ends with this error after 10-50 epochs of 5k images.

Have you seen this or have any idea what could be causing it? The losses are steadily going down so I'm a bit confused as to what's happening..

hwaxxer avatar Mar 09 '21 23:03 hwaxxer