torch-ntm model learned for the copy task with default parameters doesn't generalize

I modified the copy task to save the model every 100 iterations. After doing

th tasks/copy.lua

with the default parameters, I played around with the model and it is clear that seq and torch.gt(forward(model, seq), .5) are very different for most choices of seq. I saved the loss per sequence in the name of the serialization files, and they suggest that the good performance of the model at the end seems to be a fluke (of the manual random seed 0 hardcoded into copy.lua)

$> ls | tail
epoch09100_3.9718.model                                                                                                                        
epoch09200_4.5562.model                                                                                                                        
epoch09300_4.5259.model                                                                                                                        
epoch09400_4.4691.model                                                                                                                        
epoch09500_5.2147.model                                                                                                                        
epoch09600_3.6268.model                                                                                                                        
epoch09700_1.8399.model                                                                                                                        
epoch09800_1.2273.model                                                                                                                        
epoch09900_0.5149.model                                                                                                                        
epoch10000_0.2789.model

Compare with

$> ls | head                                                                                                                                   
epoch00100_5.5219.model                                                                                                                        
epoch00200_5.6047.model                                                                                                                        
epoch00300_5.5736.model                                                                                                                        
epoch00400_5.6103.model                                                                                                                        
epoch00500_5.5607.model                                                                                                                        
epoch00600_5.5996.model                                                                                                                        
epoch00700_5.4609.model                                                                                                                        
epoch00800_5.7227.model                                                                                                                        
epoch00900_5.5619.model                                                                                                                        
epoch01000_5.5650.model

In fact the only times that the loss per sequence goes below 1 is in the last 2 epochs.

Did you get a different result with the default parameters?

Sep 17 '15 09:09 eulerreich

Are you seeing this when you run the trained model on new examples, or when training a model from scratch using a different random seed?

Sep 17 '15 17:09 kaishengtai

I trained the model with the default parameters (in particular, the random seed 0 hard coded into the file). After the training finished, I checked it in th with new random examples; for example,

th> seq = generate_sequence(5, 8)                                                                                                              
                                                                      [0.0002s                                                                 
th> output = forward(model, seq)                                                                                                               
                                                                      [0.3955s                                                                 
th> seq                                                                                                                                        
 0  0  0  1  1  1  0  0  0  0                                                                                                                  
 0  0  1  0  1  0  0  0  1  0                                                                                                                  
 0  0  0  0  0  0  1  0  1  0                                                                                                                  
 0  0  0  0  0  1  0  1  0  0                                                                                                                  
 0  0  1  0  1  1  0  0  0  0                                                                                                                  
[torch.DoubleTensor of size 5x10]                                                                                                              

                                                                      [0.0001s                                                                 
th> torch.gt(output, .5)                                                                                                                       
 0  0  0  1  0  1  0  0  1  1                                                                                                                  
 0  0  1  1  1  0  1  1  0  1                                                                                                                  
 0  0  0  0  0  0  0  0  0  1                                                                                                                  
 0  0  0  0  0  0  0  0  0  0                                                                                                                  
 0  0  1  0  1  0  0  0  0  0                                                                                                                  
[torch.ByteTensor of size 5x10]

Sep 17 '15 19:09 eulerreich

OK, I see. I'll have a look -- thanks for pointing out this issue.

Sep 17 '15 19:09 kaishengtai

I'm getting the same nonconvergence for the associative recall task. After running for 50000 epochs, my log files indicate the losses fluctuate quite a bit

$> ls results/1442574433/ | tail -n 20                                                                                                         
epoch48100_9.0221.model                                                                                                                        
epoch48200_9.0616.model                                                                                                                        
epoch48300_0.0393.model                                                                                                                        
epoch48400_8.7340.model                                                                                                                        
epoch48500_11.9884.model                                                                                                                       
epoch48600_10.3534.model                                                                                                                       
epoch48700_4.9097.model                                                                                                                        
epoch48800_10.3642.model                                                                                                                       
epoch48900_0.0683.model                                                                                                                        
epoch49000_11.1729.model                                                                                                                       
epoch49100_4.0965.model                                                                                                                        
epoch49200_2.9329.model                                                                                                                        
epoch49300_8.6038.model                                                                                                                        
epoch49400_0.0436.model                                                                                                                        
epoch49500_0.0823.model                                                                                                                        
epoch49600_8.0040.model                                                                                                                        
epoch49700_7.4330.model                                                                                                                        
epoch49800_10.0362.model                                                                                                                       
epoch49900_7.4087.model                                                                                                                        
epoch50000_10.0742.model

Compare with

$> ls results/1442574433/ | head -n 20                                                                                                         
epoch00100_12.3335.model                                                                                                                       
epoch00200_12.5032.model                                                                                                                       
epoch00300_12.2715.model                                                                                                                       
epoch00400_12.5553.model                                                                                                                       
epoch00500_12.5074.model                                                                                                                       
epoch00600_12.5660.model                                                                                                                       
epoch00700_12.6916.model                                                                                                                       
epoch00800_12.4654.model                                                                                                                       
epoch00900_12.4325.model                                                                                                                       
epoch01000_12.6071.model                                                                                                                       
epoch01100_12.4807.model                                                                                                                       
epoch01200_12.2052.model                                                                                                                       
epoch01300_12.3652.model                                                                                                                       
epoch01400_12.5196.model                                                                                                                       
epoch01500_12.3711.model                                                                                                                       
epoch01600_12.7436.model                                                                                                                       
epoch01700_12.3393.model                                                                                                                       
epoch01800_12.4030.model                                                                                                                       
epoch01900_12.3585.model                                                                                                                       
epoch02000_12.3311.model

Here is an example output by one of the models saved at the end.

-- query 1 of 4: index 4                                                                                                                       
 0  0  1  0  1  0  1  0                                                                                                                        
 0  0  1  0  0  0  1  1                                                                                                                        
 0  0  1  1  1  1  0  0                                                                                                                        
[torch.DoubleTensor of size 3x8]  
-- target 1 of 4: index 5                                                                                                                      
 0  0  1  0  1  0  1  0                                                                                                                        
 0  0  0  0  1  0  1  1                                                                                                                        
 0  0  0  0  1  1  0  1                                                                                                                        
[torch.DoubleTensor of size 3x8]
-- output 1 of 4                                                                                                                               
 0.0001  0.0001  0.5167  0.4437  0.4385  0.4977  0.9151  0.0787                                                                                
 0.0000  0.0000  0.3638  0.2945  0.4635  0.0279  0.7452  0.7260                                                                                
 0.0000  0.0000  0.1876  0.7161  0.5930  0.3402  0.0610  0.1457                                                                                
[torch.DoubleTensor of size 3x8]

Sep 21 '15 11:09 eulerreich

torch-ntm torch-ntm copied to clipboard

model learned for the copy task with default parameters doesn't generalize

torch-ntm
torch-ntm copied to clipboard