Mocha.jl Is deterministic processing supported?

I have an application where I need to have deterministic training. For instance, with the same input file, I need to be able to construct a trained network that is identical every time. I have tried putting an RNG seed at the head of the file by calling srand(12345), but I am still getting non-deterministic behavior. Is this intended, or am I missing something?

Sep 18 '15 23:09 droidicus

@droidicus Maybe try disabling the random shuffling of the data layer if you haven't done so yet?

Sep 19 '15 02:09 pluskid

No difference.

I did note that it gives me a different result on runs 1 and 2, but then the same result for runs 2, 3, 4, etc. Weird!

Sep 19 '15 23:09 droidicus

The randomness come mainly from those:

Random initialization of weights
Random shuffling of the data
Random masking of data if you have dropout layer

everything else should be deterministic. And the behaviour you described sounds really weird.

Sep 20 '15 20:09 pluskid

I didn't have much time to look at over the weekend, but I will do some more tests soon. If I can reproduce the behavior with a simple script I will get back to you.

Sep 21 '15 01:09 droidicus

Alright, this is getting even weirder... Got two files for you here: https://gist.github.com/droidicus/fbad0be977d86754d772

They should be the same, but the output from the two files is different. Running in CPU mode (to reduce confounding variables) on Julia 0.3.11, the iJulia notebook gives me the following output for the show's at the end, with Net2 == Net3, and Net != Net2:

"Net:" => "Net:"
to_array(net.states[2].W) => [0.26551421211326687 0.40548599194932894 -0.17079960984822382 0.5708911704494465 0.17945767179439742 -0.4505212518589521 0.2524846440343289 0.4100377597556418 -0.0378205948402991 0.04556538323403562
 0.5614450138141187 0.06794189420618242 -0.4278129700585692 -0.15666869053922852 -0.028669664186539663 -0.060939722266605836 0.4211181361364348 -0.05871191221583788 0.5264304255295991 -0.674667865523455
 -0.08426190369719622 0.3575453370134926 -0.23204788716446811 -0.9914168119464785 -0.32106063463133333 0.9581391723454191 -0.03544995373464375 0.2250757533965377 -0.5125497168234864 0.7019169268217516
 -0.037082974054073944 -0.5660530741972695 -0.35538569804052295 -0.11791428574273496 -0.24373780147975402 0.1547706620261566 -0.3473401742596861 -0.1482496615083884 -0.24630768001956388 -0.04964220218278014]
"Net2:" => "Net2:"
to_array(net2.states[2].W) => [0.5257203253281926 0.2713077963194356 0.4059804232470407 -0.16559542910138636 0.5714606003794415 0.11546257245782861 -0.46848209367542865 0.08245992238127126 0.4092988013937339 -0.34385124745156453
 0.3431515065881919 0.5633112462510935 0.06829432273247854 -0.4243509722064087 -0.16333561564739194 -0.31182243862081077 -0.1410158760142102 0.5360543360168006 -0.05921180071365512 0.15489946759823348
 -0.0372759473407259 -0.07798500931590786 0.3576160485979048 -0.23083148038180928 -0.9749761062862892 0.1890078140961957 1.0756862489051717 -0.6069315176989978 0.22501327259729265 -0.22169745838414054
 -0.4709449148487559 -0.03490613560578416 -0.5660594227098282 -0.35524674880817536 -0.11049630386217288 -0.024868611512918852 0.2039302360944958 -0.5723418702567421 -0.14821822169759372 -0.10229915936609603]
"Net3:" => "Net3:"
to_array(net3.states[2].W) => [0.5257203253281926 0.2713077963194356 0.4059804232470407 -0.16559542910138636 0.5714606003794415 0.11546257245782861 -0.46848209367542865 0.08245992238127126 0.4092988013937339 -0.34385124745156453
 0.3431515065881919 0.5633112462510935 0.06829432273247854 -0.4243509722064087 -0.16333561564739194 -0.31182243862081077 -0.1410158760142102 0.5360543360168006 -0.05921180071365512 0.15489946759823348
 -0.0372759473407259 -0.07798500931590786 0.3576160485979048 -0.23083148038180928 -0.9749761062862892 0.1890078140961957 1.0756862489051717 -0.6069315176989978 0.22501327259729265 -0.22169745838414054
 -0.4709449148487559 -0.03490613560578416 -0.5660594227098282 -0.35524674880817536 -0.11049630386217288 -0.024868611512918852 0.2039302360944958 -0.5723418702567421 -0.14821822169759372 -0.10229915936609603]

And the straight script file gives the this output for the show's at the end with Net == Net2 == Net3:

"Net:" => "Net:"
to_array(net.states[2].W) => [-0.008565843966493904 -0.08681834523650475 -0.32559396535573193 0.5158664228251163 0.3637764590255966 -0.06443126415036486 0.020749776886968125 -0.49545555379581174 -0.4598270988852295 -0.20359297254869949
 -0.4483613242899235 -0.4916183158057019 -0.15891124752384342 -0.3419470228031253 0.44891513076615785 -0.4167667149142084 -0.4268246349457795 0.34820025890665524 0.06174706623824192 -0.6018127201687753
 0.48617332118022827 -0.3516609089299679 -0.343117015694074 0.11892954716045527 -0.4993760741795881 0.14335897918871646 -0.03867644112058712 0.17059748100329009 0.9390898286763543 1.1111345731582085
 0.3357264543015925 0.08841037775817093 0.5036390371248832 -0.4548531296262253 0.34964259440301926 -0.2787733829841452 -0.3018572894112434 -0.1786366678602551 -0.24290980054733347 0.08553500875834226]
"Net2:" => "Net2:"
to_array(net2.states[2].W) => [-0.008565843966493904 -0.08681834523650475 -0.32559396535573193 0.5158664228251163 0.3637764590255966 -0.06443126415036486 0.020749776886968125 -0.49545555379581174 -0.4598270988852295 -0.20359297254869949
 -0.4483613242899235 -0.4916183158057019 -0.15891124752384342 -0.3419470228031253 0.44891513076615785 -0.4167667149142084 -0.4268246349457795 0.34820025890665524 0.06174706623824192 -0.6018127201687753
 0.48617332118022827 -0.3516609089299679 -0.343117015694074 0.11892954716045527 -0.4993760741795881 0.14335897918871646 -0.03867644112058712 0.17059748100329009 0.9390898286763543 1.1111345731582085
 0.3357264543015925 0.08841037775817093 0.5036390371248832 -0.4548531296262253 0.34964259440301926 -0.2787733829841452 -0.3018572894112434 -0.1786366678602551 -0.24290980054733347 0.08553500875834226]
"Net3:" => "Net3:"
to_array(net3.states[2].W) => [-0.008565843966493904 -0.08681834523650475 -0.32559396535573193 0.5158664228251163 0.3637764590255966 -0.06443126415036486 0.020749776886968125 -0.49545555379581174 -0.4598270988852295 -0.20359297254869949
 -0.4483613242899235 -0.4916183158057019 -0.15891124752384342 -0.3419470228031253 0.44891513076615785 -0.4167667149142084 -0.4268246349457795 0.34820025890665524 0.06174706623824192 -0.6018127201687753
 0.48617332118022827 -0.3516609089299679 -0.343117015694074 0.11892954716045527 -0.4993760741795881 0.14335897918871646 -0.03867644112058712 0.17059748100329009 0.9390898286763543 1.1111345731582085
 0.3357264543015925 0.08841037775817093 0.5036390371248832 -0.4548531296262253 0.34964259440301926 -0.2787733829841452 -0.3018572894112434 -0.1786366678602551 -0.24290980054733347 0.08553500875834226]

What is even weirder, if I remove the @show net (net2, net3, etc) from immediately before the solver line in both files, the output of the script is unchanged, but the notebook changes, with Net == Net2 == Net3:

"Net:" => "Net:"
to_array(net.states[2].W) => [0.5257203253281926 0.2713077963194356 0.4059804232470407 -0.16559542910138636 0.5714606003794415 0.11546257245782861 -0.46848209367542865 0.08245992238127126 0.4092988013937339 -0.34385124745156453
 0.3431515065881919 0.5633112462510935 0.06829432273247854 -0.4243509722064087 -0.16333561564739194 -0.31182243862081077 -0.1410158760142102 0.5360543360168006 -0.05921180071365512 0.15489946759823348
 -0.0372759473407259 -0.07798500931590786 0.3576160485979048 -0.23083148038180928 -0.9749761062862892 0.1890078140961957 1.0756862489051717 -0.6069315176989978 0.22501327259729265 -0.22169745838414054
 -0.4709449148487559 -0.03490613560578416 -0.5660594227098282 -0.35524674880817536 -0.11049630386217288 -0.024868611512918852 0.2039302360944958 -0.5723418702567421 -0.14821822169759372 -0.10229915936609603]
"Net2:" => "Net2:"
to_array(net2.states[2].W) => [0.5257203253281926 0.2713077963194356 0.4059804232470407 -0.16559542910138636 0.5714606003794415 0.11546257245782861 -0.46848209367542865 0.08245992238127126 0.4092988013937339 -0.34385124745156453
 0.3431515065881919 0.5633112462510935 0.06829432273247854 -0.4243509722064087 -0.16333561564739194 -0.31182243862081077 -0.1410158760142102 0.5360543360168006 -0.05921180071365512 0.15489946759823348
 -0.0372759473407259 -0.07798500931590786 0.3576160485979048 -0.23083148038180928 -0.9749761062862892 0.1890078140961957 1.0756862489051717 -0.6069315176989978 0.22501327259729265 -0.22169745838414054
 -0.4709449148487559 -0.03490613560578416 -0.5660594227098282 -0.35524674880817536 -0.11049630386217288 -0.024868611512918852 0.2039302360944958 -0.5723418702567421 -0.14821822169759372 -0.10229915936609603]
"Net3:" => "Net3:"
to_array(net3.states[2].W) => [0.5257203253281926 0.2713077963194356 0.4059804232470407 -0.16559542910138636 0.5714606003794415 0.11546257245782861 -0.46848209367542865 0.08245992238127126 0.4092988013937339 -0.34385124745156453
 0.3431515065881919 0.5633112462510935 0.06829432273247854 -0.4243509722064087 -0.16333561564739194 -0.31182243862081077 -0.1410158760142102 0.5360543360168006 -0.05921180071365512 0.15489946759823348
 -0.0372759473407259 -0.07798500931590786 0.3576160485979048 -0.23083148038180928 -0.9749761062862892 0.1890078140961957 1.0756862489051717 -0.6069315176989978 0.22501327259729265 -0.22169745838414054
 -0.4709449148487559 -0.03490613560578416 -0.5660594227098282 -0.35524674880817536 -0.11049630386217288 -0.024868611512918852 0.2039302360944958 -0.5723418702567421 -0.14821822169759372 -0.10229915936609603]

I have no idea why. For now I can ditch the notebook, or just do a dummy training before my "real" training, but it would be nice if this was fixed either in Mocha or iJulia.

Sep 21 '15 15:09 droidicus

@droidicus I see what is happening now. That is due to Mocha parameter sharing. The parameter sharing is used to enable sharing of parameters between training and testing networks. So when you train several networks on the same backend, you accidentally shared the parameters. The reason why you end up printing the same W after you finish training all network is because they are all the same W. The reason you did not get the same print out in notebook version is that you print the original W for the first time, and when you train network 2, the W get updated further.

There are several possible solutions for this:

Name the layers properly, the layer names are used as key for parameter sharing. For example, you can append "net-$i" as prefix for the name of those layers
Call registry_reset(backend) to clear all saved parameters and train another network
Use a different backend for each of the network

Sep 22 '15 13:09 pluskid

I tried all three solutions (together and separately), I am also saving the values of w immediately after training each network, and before setup for the next training begins. I updated my example code here: https://gist.github.com/droidicus/fbad0be977d86754d772

I am seeing the same behavior as before, in the iJulia notebook Net != Net2 and Net2 == Net3. In the *.jl script Net == Net2 == Net3.

This also doesn't explain why the output from the notebook and the script are different, or why the @show net (/net2/net3) makes a difference for the notebook but not the script.

Sep 22 '15 14:09 droidicus