Is deterministic processing supported?
I have an application where I need to have deterministic training. For instance, with the same input file, I need to be able to construct a trained network that is identical every time. I have tried putting an RNG seed at the head of the file by calling srand(12345), but I am still getting non-deterministic behavior. Is this intended, or am I missing something?
@droidicus Maybe try disabling the random shuffling of the data layer if you haven't done so yet?
No difference.
I did note that it gives me a different result on runs 1 and 2, but then the same result for runs 2, 3, 4, etc. Weird!
The randomness come mainly from those:
- Random initialization of weights
- Random shuffling of the data
- Random masking of data if you have dropout layer
everything else should be deterministic. And the behaviour you described sounds really weird.
I didn't have much time to look at over the weekend, but I will do some more tests soon. If I can reproduce the behavior with a simple script I will get back to you.
Alright, this is getting even weirder... Got two files for you here: https://gist.github.com/droidicus/fbad0be977d86754d772
They should be the same, but the output from the two files is different. Running in CPU mode (to reduce confounding variables) on Julia 0.3.11, the iJulia notebook gives me the following output for the show's at the end, with Net2 == Net3, and Net != Net2:
"Net:" => "Net:"
to_array(net.states[2].W) => [0.26551421211326687 0.40548599194932894 -0.17079960984822382 0.5708911704494465 0.17945767179439742 -0.4505212518589521 0.2524846440343289 0.4100377597556418 -0.0378205948402991 0.04556538323403562
0.5614450138141187 0.06794189420618242 -0.4278129700585692 -0.15666869053922852 -0.028669664186539663 -0.060939722266605836 0.4211181361364348 -0.05871191221583788 0.5264304255295991 -0.674667865523455
-0.08426190369719622 0.3575453370134926 -0.23204788716446811 -0.9914168119464785 -0.32106063463133333 0.9581391723454191 -0.03544995373464375 0.2250757533965377 -0.5125497168234864 0.7019169268217516
-0.037082974054073944 -0.5660530741972695 -0.35538569804052295 -0.11791428574273496 -0.24373780147975402 0.1547706620261566 -0.3473401742596861 -0.1482496615083884 -0.24630768001956388 -0.04964220218278014]
"Net2:" => "Net2:"
to_array(net2.states[2].W) => [0.5257203253281926 0.2713077963194356 0.4059804232470407 -0.16559542910138636 0.5714606003794415 0.11546257245782861 -0.46848209367542865 0.08245992238127126 0.4092988013937339 -0.34385124745156453
0.3431515065881919 0.5633112462510935 0.06829432273247854 -0.4243509722064087 -0.16333561564739194 -0.31182243862081077 -0.1410158760142102 0.5360543360168006 -0.05921180071365512 0.15489946759823348
-0.0372759473407259 -0.07798500931590786 0.3576160485979048 -0.23083148038180928 -0.9749761062862892 0.1890078140961957 1.0756862489051717 -0.6069315176989978 0.22501327259729265 -0.22169745838414054
-0.4709449148487559 -0.03490613560578416 -0.5660594227098282 -0.35524674880817536 -0.11049630386217288 -0.024868611512918852 0.2039302360944958 -0.5723418702567421 -0.14821822169759372 -0.10229915936609603]
"Net3:" => "Net3:"
to_array(net3.states[2].W) => [0.5257203253281926 0.2713077963194356 0.4059804232470407 -0.16559542910138636 0.5714606003794415 0.11546257245782861 -0.46848209367542865 0.08245992238127126 0.4092988013937339 -0.34385124745156453
0.3431515065881919 0.5633112462510935 0.06829432273247854 -0.4243509722064087 -0.16333561564739194 -0.31182243862081077 -0.1410158760142102 0.5360543360168006 -0.05921180071365512 0.15489946759823348
-0.0372759473407259 -0.07798500931590786 0.3576160485979048 -0.23083148038180928 -0.9749761062862892 0.1890078140961957 1.0756862489051717 -0.6069315176989978 0.22501327259729265 -0.22169745838414054
-0.4709449148487559 -0.03490613560578416 -0.5660594227098282 -0.35524674880817536 -0.11049630386217288 -0.024868611512918852 0.2039302360944958 -0.5723418702567421 -0.14821822169759372 -0.10229915936609603]
And the straight script file gives the this output for the show's at the end with Net == Net2 == Net3:
"Net:" => "Net:"
to_array(net.states[2].W) => [-0.008565843966493904 -0.08681834523650475 -0.32559396535573193 0.5158664228251163 0.3637764590255966 -0.06443126415036486 0.020749776886968125 -0.49545555379581174 -0.4598270988852295 -0.20359297254869949
-0.4483613242899235 -0.4916183158057019 -0.15891124752384342 -0.3419470228031253 0.44891513076615785 -0.4167667149142084 -0.4268246349457795 0.34820025890665524 0.06174706623824192 -0.6018127201687753
0.48617332118022827 -0.3516609089299679 -0.343117015694074 0.11892954716045527 -0.4993760741795881 0.14335897918871646 -0.03867644112058712 0.17059748100329009 0.9390898286763543 1.1111345731582085
0.3357264543015925 0.08841037775817093 0.5036390371248832 -0.4548531296262253 0.34964259440301926 -0.2787733829841452 -0.3018572894112434 -0.1786366678602551 -0.24290980054733347 0.08553500875834226]
"Net2:" => "Net2:"
to_array(net2.states[2].W) => [-0.008565843966493904 -0.08681834523650475 -0.32559396535573193 0.5158664228251163 0.3637764590255966 -0.06443126415036486 0.020749776886968125 -0.49545555379581174 -0.4598270988852295 -0.20359297254869949
-0.4483613242899235 -0.4916183158057019 -0.15891124752384342 -0.3419470228031253 0.44891513076615785 -0.4167667149142084 -0.4268246349457795 0.34820025890665524 0.06174706623824192 -0.6018127201687753
0.48617332118022827 -0.3516609089299679 -0.343117015694074 0.11892954716045527 -0.4993760741795881 0.14335897918871646 -0.03867644112058712 0.17059748100329009 0.9390898286763543 1.1111345731582085
0.3357264543015925 0.08841037775817093 0.5036390371248832 -0.4548531296262253 0.34964259440301926 -0.2787733829841452 -0.3018572894112434 -0.1786366678602551 -0.24290980054733347 0.08553500875834226]
"Net3:" => "Net3:"
to_array(net3.states[2].W) => [-0.008565843966493904 -0.08681834523650475 -0.32559396535573193 0.5158664228251163 0.3637764590255966 -0.06443126415036486 0.020749776886968125 -0.49545555379581174 -0.4598270988852295 -0.20359297254869949
-0.4483613242899235 -0.4916183158057019 -0.15891124752384342 -0.3419470228031253 0.44891513076615785 -0.4167667149142084 -0.4268246349457795 0.34820025890665524 0.06174706623824192 -0.6018127201687753
0.48617332118022827 -0.3516609089299679 -0.343117015694074 0.11892954716045527 -0.4993760741795881 0.14335897918871646 -0.03867644112058712 0.17059748100329009 0.9390898286763543 1.1111345731582085
0.3357264543015925 0.08841037775817093 0.5036390371248832 -0.4548531296262253 0.34964259440301926 -0.2787733829841452 -0.3018572894112434 -0.1786366678602551 -0.24290980054733347 0.08553500875834226]
What is even weirder, if I remove the @show net (net2, net3, etc) from immediately before the solver line in both files, the output of the script is unchanged, but the notebook changes, with Net == Net2 == Net3:
"Net:" => "Net:"
to_array(net.states[2].W) => [0.5257203253281926 0.2713077963194356 0.4059804232470407 -0.16559542910138636 0.5714606003794415 0.11546257245782861 -0.46848209367542865 0.08245992238127126 0.4092988013937339 -0.34385124745156453
0.3431515065881919 0.5633112462510935 0.06829432273247854 -0.4243509722064087 -0.16333561564739194 -0.31182243862081077 -0.1410158760142102 0.5360543360168006 -0.05921180071365512 0.15489946759823348
-0.0372759473407259 -0.07798500931590786 0.3576160485979048 -0.23083148038180928 -0.9749761062862892 0.1890078140961957 1.0756862489051717 -0.6069315176989978 0.22501327259729265 -0.22169745838414054
-0.4709449148487559 -0.03490613560578416 -0.5660594227098282 -0.35524674880817536 -0.11049630386217288 -0.024868611512918852 0.2039302360944958 -0.5723418702567421 -0.14821822169759372 -0.10229915936609603]
"Net2:" => "Net2:"
to_array(net2.states[2].W) => [0.5257203253281926 0.2713077963194356 0.4059804232470407 -0.16559542910138636 0.5714606003794415 0.11546257245782861 -0.46848209367542865 0.08245992238127126 0.4092988013937339 -0.34385124745156453
0.3431515065881919 0.5633112462510935 0.06829432273247854 -0.4243509722064087 -0.16333561564739194 -0.31182243862081077 -0.1410158760142102 0.5360543360168006 -0.05921180071365512 0.15489946759823348
-0.0372759473407259 -0.07798500931590786 0.3576160485979048 -0.23083148038180928 -0.9749761062862892 0.1890078140961957 1.0756862489051717 -0.6069315176989978 0.22501327259729265 -0.22169745838414054
-0.4709449148487559 -0.03490613560578416 -0.5660594227098282 -0.35524674880817536 -0.11049630386217288 -0.024868611512918852 0.2039302360944958 -0.5723418702567421 -0.14821822169759372 -0.10229915936609603]
"Net3:" => "Net3:"
to_array(net3.states[2].W) => [0.5257203253281926 0.2713077963194356 0.4059804232470407 -0.16559542910138636 0.5714606003794415 0.11546257245782861 -0.46848209367542865 0.08245992238127126 0.4092988013937339 -0.34385124745156453
0.3431515065881919 0.5633112462510935 0.06829432273247854 -0.4243509722064087 -0.16333561564739194 -0.31182243862081077 -0.1410158760142102 0.5360543360168006 -0.05921180071365512 0.15489946759823348
-0.0372759473407259 -0.07798500931590786 0.3576160485979048 -0.23083148038180928 -0.9749761062862892 0.1890078140961957 1.0756862489051717 -0.6069315176989978 0.22501327259729265 -0.22169745838414054
-0.4709449148487559 -0.03490613560578416 -0.5660594227098282 -0.35524674880817536 -0.11049630386217288 -0.024868611512918852 0.2039302360944958 -0.5723418702567421 -0.14821822169759372 -0.10229915936609603]
I have no idea why. For now I can ditch the notebook, or just do a dummy training before my "real" training, but it would be nice if this was fixed either in Mocha or iJulia.
@droidicus I see what is happening now. That is due to Mocha parameter sharing. The parameter sharing is used to enable sharing of parameters between training and testing networks. So when you train several networks on the same backend, you accidentally shared the parameters. The reason why you end up printing the same W after you finish training all network is because they are all the same W. The reason you did not get the same print out in notebook version is that you print the original W for the first time, and when you train network 2, the W get updated further.
There are several possible solutions for this:
- Name the layers properly, the layer names are used as key for parameter sharing. For example, you can append "net-$i" as prefix for the name of those layers
- Call
registry_reset(backend)to clear all saved parameters and train another network - Use a different backend for each of the network
I tried all three solutions (together and separately), I am also saving the values of w immediately after training each network, and before setup for the next training begins. I updated my example code here: https://gist.github.com/droidicus/fbad0be977d86754d772
I am seeing the same behavior as before, in the iJulia notebook Net != Net2 and Net2 == Net3. In the *.jl script Net == Net2 == Net3.
This also doesn't explain why the output from the notebook and the script are different, or why the @show net (/net2/net3) makes a difference for the notebook but not the script.