ntm-lasagne icon indicating copy to clipboard operation
ntm-lasagne copied to clipboard

Copy task - Learning on input of length 1

Open tristandeleu opened this issue 10 years ago • 2 comments
trafficstars

As suggested by @adrienball, I ran an experiment to learn the NTM on only length one inputs to see if it could already learn such a simple behavior (even if it overfits). The NTM successfully recovered the length one inputs: copy-1

When I tested this trained NTM on longer inputs, it consistently failed at recovering the whole sequences (as expected, due to the lack of variety in the input lengths), but generally succeeded to remember the first vector. However some interesting patterns emerged:

  • The NTM was sometimes able to recover the first 2 vectors even though it had never seen any inputs larger than two copy-10-partial
  • The NTM sometimes repeated this first vector (with some "noise") multiple times. This is an interesting property that has come up frequently enough to be worth investigating on. copy-10-repeat
Parameters of the experiment
  • NTM layer with FeedForward controller + 1 read head + 1 write head
  • Update rule: Graves' RMSprop with learning_rate=1e-3 (other parameters left as is from his previous paper)
  • Activations: ReLu for [add, key, beta], 1 + ReLu for gamma, sigmoid for [gate, dense_output], softmax for shift
  • Initialization: Uniform Glorot for any weight matrix + Memory init, Zeros for any bias + Hidden state init, EquiProba for weights init (Read & Write)
Learning curve

Gray: Cost function, Red: Moving average of the cost function over 500 iterations copy-learning-curve

tristandeleu avatar Sep 21 '15 17:09 tristandeleu

Man this is so exciting!

Le lundi 21 septembre 2015, Tristan Deleu [email protected] a écrit :

As suggested by @adrienball https://github.com/adrienball, I ran an experiment to learn the NTM on only length one inputs to see if it could already learn such a simple behavior (even if it overfits). The NTM successfully recovered the length one inputs: [image: copy-1] https://cloud.githubusercontent.com/assets/2018752/9998956/21bae724-6094-11e5-982a-31db67fd3bef.png

When I tested this trained NTM on longer inputs, it consistently failed at recovering the whole sequences (as expected, due to the lack of variety in the input lengths), but generally succeeded to remember the first vector. However some interesting patterns emerged:

  • The NTM was sometimes able to recover the first 2 vectors even though it had never seen any inputs larger than two [image: copy-10-partial] https://cloud.githubusercontent.com/assets/2018752/9999194/4c64eef6-6095-11e5-9b29-9ad011eab63d.png
  • The NTM sometimes repeated this first vector (with some "noise") multiple times. This is an interesting property that has come up frequently enough to be worth investigating on. [image: copy-10-repeat] https://cloud.githubusercontent.com/assets/2018752/9999072/aaf44bc0-6094-11e5-9a63-f6b2528ea8f4.png

Parameters of the experiment:

  • NTM layer with FeedForward controller + 1 read head + 1 write head
  • Update rule: Graves' RMSprop with learning_rate=1e-3 (other parameters left as is from his previous paper)
  • Activations: ReLu for [add, key, beta], 1 + ReLu for gamma, sigmoid for [gate, dense_output], softmax for shift

— Reply to this email directly or view it on GitHub https://github.com/snipsco/nlp-neural-turing-machine/issues/4.

Adrien Ball+33 (0) 6 70 87 57 78 <%2B33%20%280%29%206%2051%2053%2038%2027>[email protected] [email protected]: adrien.balltwitter: @adrien_ball

adrienball avatar Sep 22 '15 07:09 adrienball

Nice!

--  Dr Maël Primet, PhD +33 (0) 6 51 53 38 27 [email protected] skype: maelpr twitter: @mael_p

On 22 Sep 2015 at 09:08:17, Adrien Ball ([email protected]) wrote:

Man this is so exciting!

Le lundi 21 septembre 2015, Tristan Deleu [email protected] a écrit :

As suggested by @adrienball https://github.com/adrienball, I ran an experiment to learn the NTM on only length one inputs to see if it could already learn such a simple behavior (even if it overfits). The NTM successfully recovered the length one inputs: [image: copy-1] https://cloud.githubusercontent.com/assets/2018752/9998956/21bae724-6094-11e5-982a-31db67fd3bef.png

When I tested this trained NTM on longer inputs, it consistently failed at recovering the whole sequences (as expected, due to the lack of variety in the input lengths), but generally succeeded to remember the first vector. However some interesting patterns emerged:

  • The NTM was sometimes able to recover the first 2 vectors even though it had never seen any inputs larger than two [image: copy-10-partial] https://cloud.githubusercontent.com/assets/2018752/9999194/4c64eef6-6095-11e5-9b29-9ad011eab63d.png
  • The NTM sometimes repeated this first vector (with some "noise") multiple times. This is an interesting property that has come up frequently enough to be worth investigating on. [image: copy-10-repeat] https://cloud.githubusercontent.com/assets/2018752/9999072/aaf44bc0-6094-11e5-9a63-f6b2528ea8f4.png

Parameters of the experiment:

  • NTM layer with FeedForward controller + 1 read head + 1 write head
  • Update rule: Graves' RMSprop with learning_rate=1e-3 (other parameters left as is from his previous paper)
  • Activations: ReLu for [add, key, beta], 1 + ReLu for gamma, sigmoid for [gate, dense_output], softmax for shift

— Reply to this email directly or view it on GitHub https://github.com/snipsco/nlp-neural-turing-machine/issues/4.

Adrien Ball+33 (0) 6 70 87 57 78 <%2B33%20%280%29%206%2051%2053%2038%2027>[email protected] [email protected]: adrien.balltwitter: @adrien_ball — Reply to this email directly or view it on GitHub.

maelp avatar Sep 22 '15 08:09 maelp