diffvg
diffvg copied to clipboard
non-deterministic behaviour of the renderer in "painterly_rendering.py"
Hi, thanks a lot for providing this public implementation.
I am trying to achieve a deterministic training for reproducibility. In lines 40-41 in "painterly_rendering.py", you defined:
random.seed(1234)
torch.manual_seed(1234)
However, when running exactly the same command twice, using the same machine and environment, I get different loss values in the first iterations:
command used:
python painterly_rendering.py imgs/baboon.png --num_iter 3
output, 1:
Scene construction, time: 0.01136 s
Forward pass, time: 0.03906 s
iteration: 0
Scene construction, time: 0.00223 s
Forward pass, time: 0.00845 s
render loss: 0.2781107723712921
Backward pass, time: 0.05038 s
iteration: 1
Scene construction, time: 0.00215 s
Forward pass, time: 0.00980 s
render loss: 0.272503137588501
Backward pass, time: 0.05281 s
iteration: 2
Scene construction, time: 0.00186 s
Forward pass, time: 0.00857 s
render loss: 0.266690731048584
Backward pass, time: 0.07965 s
Scene construction, time: 0.00172 s
Forward pass, time: 0.00384 s
output, 2:
Scene construction, time: 0.02374 s
Forward pass, time: 0.01743 s
iteration: 0
Scene construction, time: 0.00133 s
Forward pass, time: 0.01063 s
render loss: 0.2781107723712921
Backward pass, time: 0.04362 s
iteration: 1
Scene construction, time: 0.00159 s
Forward pass, time: 0.00777 s
render loss: 0.2725030183792114
Backward pass, time: 0.07198 s
iteration: 2
Scene construction, time: 0.00176 s
Forward pass, time: 0.00802 s
render loss: 0.2666889429092407
Backward pass, time: 0.07049 s
Scene construction, time: 0.00244 s
Forward pass, time: 0.00385 s
you can see that in the second iteration the losses are different:
0.272503137588501 and 0.2725030183792114
Is there a way to fix that in order to achieve consistent results during training?
Thanks
The problem arises because of data race between threads. Because of the limited precision, addition of float numbers is not associative, i.e. (a + b) + c != a + (b + c) Changing the number of threads in the code to one is a temporary fix for this problem.