deep-calcium icon indicating copy to clipboard operation
deep-calcium copied to clipboard

UNet2DS tensorflow non-deterministic training

Open alexklibisz opened this issue 8 years ago • 3 comments

Just making a note for future reference that training the UNet2DS model on the GPU with Tensorflow backend results in non-deterministic gradient updates, which results in non-deterministic final results. The final submission are typically within 2% of each other in terms of mean F1 score, but still this adds a confounding factor when trying to compare changes to the architecture or training strategy.

There is a lot of material online about TF's non-determinism. Most of it points to the fact that the underlying CuDNN implementation uses non-deterministic reductions for convolutions (i.e. floating point operations are not necessarily associative). The best, most recent insight I could find was in this pull-request, with comments indicating there is supposedly a forthcoming fix to address this issue.

alexklibisz avatar Jul 17 '17 16:07 alexklibisz

This also seems to make a non-trivial difference when training UNet1D. It seems most of the new libraries now are using CuDNN, so I'm not sure there's a way around this without some fix in CuDNN.

alexklibisz avatar Aug 16 '17 19:08 alexklibisz

I have the same issue now with U-Net for segmentation making dice coef different (+3) every run with the same seed. Were you able to find a solution for this?

saeedalahmari3 avatar Dec 12 '18 13:12 saeedalahmari3

No, and based on the issues linked to the PR in my original post, it looks like it hasn't been resolved yet.

alexklibisz avatar Dec 12 '18 20:12 alexklibisz