ntm-lasagne Copy task - Learning on input of length 1

trafficstars

As suggested by @adrienball, I ran an experiment to learn the NTM on only length one inputs to see if it could already learn such a simple behavior (even if it overfits). The NTM successfully recovered the length one inputs: copy-1

When I tested this trained NTM on longer inputs, it consistently failed at recovering the whole sequences (as expected, due to the lack of variety in the input lengths), but generally succeeded to remember the first vector. However some interesting patterns emerged:

The NTM was sometimes able to recover the first 2 vectors even though it had never seen any inputs larger than two
The NTM sometimes repeated this first vector (with some "noise") multiple times. This is an interesting property that has come up frequently enough to be worth investigating on.

Parameters of the experiment

NTM layer with FeedForward controller + 1 read head + 1 write head
Update rule: Graves' RMSprop with learning_rate=1e-3 (other parameters left as is from his previous paper)
Activations: ReLu for [add, key, beta], 1 + ReLu for gamma, sigmoid for [gate, dense_output], softmax for shift
Initialization: Uniform Glorot for any weight matrix + Memory init, Zeros for any bias + Hidden state init, EquiProba for weights init (Read & Write)

Learning curve

Gray: Cost function, Red: Moving average of the cost function over 500 iterations copy-learning-curve

Sep 21 '15 17:09 tristandeleu

Man this is so exciting!

Le lundi 21 septembre 2015, Tristan Deleu [email protected] a écrit :

As suggested by @adrienball https://github.com/adrienball, I ran an experiment to learn the NTM on only length one inputs to see if it could already learn such a simple behavior (even if it overfits). The NTM successfully recovered the length one inputs: [image: copy-1] https://cloud.githubusercontent.com/assets/2018752/9998956/21bae724-6094-11e5-982a-31db67fd3bef.png

When I tested this trained NTM on longer inputs, it consistently failed at recovering the whole sequences (as expected, due to the lack of variety in the input lengths), but generally succeeded to remember the first vector. However some interesting patterns emerged:

The NTM was sometimes able to recover the first 2 vectors even though it had never seen any inputs larger than two [image: copy-10-partial] https://cloud.githubusercontent.com/assets/2018752/9999194/4c64eef6-6095-11e5-9b29-9ad011eab63d.png

The NTM sometimes repeated this first vector (with some "noise") multiple times. This is an interesting property that has come up frequently enough to be worth investigating on. [image: copy-10-repeat] https://cloud.githubusercontent.com/assets/2018752/9999072/aaf44bc0-6094-11e5-9a63-f6b2528ea8f4.png

Parameters of the experiment:

NTM layer with FeedForward controller + 1 read head + 1 write head

Update rule: Graves' RMSprop with learning_rate=1e-3 (other parameters left as is from his previous paper)

Activations: ReLu for [add, key, beta], 1 + ReLu for gamma, sigmoid for [gate, dense_output], softmax for shift

— Reply to this email directly or view it on GitHub https://github.com/snipsco/nlp-neural-turing-machine/issues/4.

Adrien Ball+33 (0) 6 70 87 57 78 <%2B33%20%280%29%206%2051%2053%2038%2027>[email protected] [email protected]: adrien.balltwitter: @adrien_ball

Sep 22 '15 07:09 adrienball

Nice!

-- Dr Maël Primet, PhD +33 (0) 6 51 53 38 27 [email protected] skype: maelpr twitter: @mael_p

On 22 Sep 2015 at 09:08:17, Adrien Ball ([email protected]) wrote:

Man this is so exciting!

Le lundi 21 septembre 2015, Tristan Deleu [email protected] a écrit :

As suggested by @adrienball https://github.com/adrienball, I ran an experiment to learn the NTM on only length one inputs to see if it could already learn such a simple behavior (even if it overfits). The NTM successfully recovered the length one inputs: [image: copy-1] https://cloud.githubusercontent.com/assets/2018752/9998956/21bae724-6094-11e5-982a-31db67fd3bef.png

When I tested this trained NTM on longer inputs, it consistently failed at recovering the whole sequences (as expected, due to the lack of variety in the input lengths), but generally succeeded to remember the first vector. However some interesting patterns emerged:

The NTM was sometimes able to recover the first 2 vectors even though it had never seen any inputs larger than two [image: copy-10-partial] https://cloud.githubusercontent.com/assets/2018752/9999194/4c64eef6-6095-11e5-9b29-9ad011eab63d.png

The NTM sometimes repeated this first vector (with some "noise") multiple times. This is an interesting property that has come up frequently enough to be worth investigating on. [image: copy-10-repeat] https://cloud.githubusercontent.com/assets/2018752/9999072/aaf44bc0-6094-11e5-9a63-f6b2528ea8f4.png

Parameters of the experiment:

NTM layer with FeedForward controller + 1 read head + 1 write head

Update rule: Graves' RMSprop with learning_rate=1e-3 (other parameters left as is from his previous paper)

Activations: ReLu for [add, key, beta], 1 + ReLu for gamma, sigmoid for [gate, dense_output], softmax for shift

— Reply to this email directly or view it on GitHub https://github.com/snipsco/nlp-neural-turing-machine/issues/4.

Adrien Ball+33 (0) 6 70 87 57 78 <%2B33%20%280%29%206%2051%2053%2038%2027>[email protected] [email protected]: adrien.balltwitter: @adrien_ball — Reply to this email directly or view it on GitHub.

Sep 22 '15 08:09 maelp

ntm-lasagne ntm-lasagne copied to clipboard

Copy task - Learning on input of length 1

Parameters of the experiment

Learning curve

ntm-lasagne
ntm-lasagne copied to clipboard