Mocha.jl InexactError when training "LeNet" on 1d image data

I am new to Mocha, and I am trying to modify the LeNet tutorial for my 1d image dataset, basically what I do is to slightly change the kernel size, and stride size as follows:


data_layer  = AsyncHDF5DataLayer(name="data", source="data/train.txt", batch_size=64, shuffle=true)
conv_layer  = ConvolutionLayer(name="conv1", n_filter=20, kernel=(5,1), bottoms=[:data], tops=[:conv])
pool_layer  = PoolingLayer(name="pool1", kernel=(2,1), stride=(2,1), bottoms=[:conv], tops=[:pool])
conv2_layer = ConvolutionLayer(name="conv2", n_filter=50, kernel=(5,1), bottoms=[:pool], tops=[:conv2])
pool2_layer = PoolingLayer(name="pool2", kernel=(2,1), stride=(2,1), bottoms=[:conv2], tops=[:pool2])
fc1_layer   = InnerProductLayer(name="ip1", output_dim=500, neuron=Neurons.ReLU(), bottoms=[:pool2], tops=[:ip1])
fc2_layer   = InnerProductLayer(name="ip2", output_dim=2, bottoms=[:ip1], tops=[:ip2])
loss_layer  = SoftmaxLossLayer(name="loss", bottoms=[:ip2,:label])

After the network is constructed, I get following error message:

04-Apr 23:17:53:INFO:root:## Performance on Validation Set after 0 iterations
04-Apr 23:17:53:INFO:root:---------------------------------------------------------
04-Apr 23:17:53:INFO:root:  Accuracy (avg over 15300) = 93.8627%
04-Apr 23:17:53:INFO:root:---------------------------------------------------------
04-Apr 23:17:53:INFO:root:
04-Apr 23:17:54:DEBUG:root:#DEBUG Entering solver loop
ERROR: LoadError: InexactError()
 in max_pooling_forward at /Users/cinvro/.julia/v0.4/Mocha/src/layers/pooling/julia-impl.jl:34
 in forward at /Users/cinvro/.julia/v0.4/Mocha/src/layers/pooling.jl:93
 in forward at /Users/cinvro/.julia/v0.4/Mocha/src/layers/pooling.jl:84
 in forward at /Users/cinvro/.julia/v0.4/Mocha/src/net.jl:148
 in onestep_solve at /Users/cinvro/.julia/v0.4/Mocha/src/solvers.jl:222
 in do_solve_loop at /Users/cinvro/.julia/v0.4/Mocha/src/solvers.jl:242
 in solve at /Users/cinvro/.julia/v0.4/Mocha/src/solvers.jl:235
 in include at /Applications/Julia-0.4.3.app/Contents/Resources/julia/lib/julia/sys.dylib
 in include_from_node1 at /Applications/Julia-0.4.3.app/Contents/Resources/julia/lib/julia/sys.dylib
 in process_options at /Applications/Julia-0.4.3.app/Contents/Resources/julia/lib/julia/sys.dylib
 in _start at /Applications/Julia-0.4.3.app/Contents/Resources/julia/lib/julia/sys.dylib

Any idea why this happens?

My net looks like this:

net

Apr 05 '16 03:04 cinvro

The line of code reporting InexactError is this line: https://github.com/pluskid/Mocha.jl/blob/master/src/layers/pooling/julia-impl.jl#L34

It is trying to assign a value to the mask, which is unsigned. If you try to assign an invalid value (e.g. a negative value), an InexactError will occur. My guessing was that the pooling range somehow goes out of range, making some negative value there. But looking at the visualization you pasted above, it seems perfectly valid. Can you maybe try to insert a print statement

println((maxh-1) * width + maxw-1)

right before that line to see what value we got that caused the error?

Apr 05 '16 05:04 pluskid

@pluskid you are right, I got -180, where maxh=0, maxw=0 and width=179. What does that mean? Is that a problem of my data or a bug?

Apr 05 '16 15:04 cinvro

It seems like some pooling region is empty. Just as a sanity check, can you change the kernel for the pooling layer from (2,1) to larger values like (3,1) to see if it runs? Thanks!

Apr 06 '16 16:04 pluskid

Thank you for the reply. Yes. I got following error after changed the kernel size of pooling layer from (2,1) to (3,1).

ERROR: LoadError: AssertionError: is_similar_shape(params[j],net.states[i].parameters[j].blob)
 in load_network at /Users/cinvro/.julia/v0.4/Mocha/src/utils/io.jl:102
 in anonymous at /Users/cinvro/.julia/v0.4/Mocha/src/solvers.jl:158
 in jldopen at /Users/cinvro/.julia/v0.4/JLD/src/JLD.jl:245
 in load_snapshot at /Users/cinvro/.julia/v0.4/Mocha/src/solvers.jl:157
 in init_solve at /Users/cinvro/.julia/v0.4/Mocha/src/solvers.jl:184
 in solve at /Users/cinvro/.julia/v0.4/Mocha/src/solvers.jl:234
 in include at /Applications/Julia-0.4.3.app/Contents/Resources/julia/lib/julia/sys.dylib
 in include_from_node1 at /Applications/Julia-0.4.3.app/Contents/Resources/julia/lib/julia/sys.dylib
 in process_options at /Applications/Julia-0.4.3.app/Contents/Resources/julia/lib/julia/sys.dylib
 in _start at /Applications/Julia-0.4.3.app/Contents/Resources/julia/lib/julia/sys.dylib

Apr 06 '16 17:04 cinvro

@cinvro That is due to previously saved snapshots. Can you remove the saved snapshot files and re-try again? Thanks!

Apr 06 '16 20:04 pluskid

@pluskid oh, I didn't realize that. Now I get -179, where maxh=0, maxw=0 and width=178.

Apr 06 '16 20:04 cinvro

@cinvro I checked the code and did not find the bug. It seems the pooling loop is not executed (otherwise maxh and maxw should not be zero). Can you at the same place print the values for hstart, hend, wstart, wend as well as val, maxval? On potential problem is that your matrix contains NaN. In this case, NaN > -Inf is false, so the pooling is unsuccessful.

Apr 09 '16 19:04 pluskid

@pluskid I got hstart=1,hend=1,wstart=89,wend=90 and maxval=-Inf. I cannot print out val because it says val is undefined, which is very strange.

Apr 12 '16 17:04 cinvro

However, I can print out val inside the for loop, which gives me val = -Inf in this case.

Apr 12 '16 19:04 cinvro

I can reproduce this error when I do not set the neuron property on the convolutional layer. It took me a while to narrow it down, but once I set neuron=Neurons.ReLU() on the convolutional layer the InexactError (NaN value for maxval in function max_pooling_forward) went away.

I see that the code posted here also doesn't have a neuron defined on the convolutional layer, so I suspect the same is the case here.

Jul 26 '16 21:07 davidparks21

Mocha.jl Mocha.jl copied to clipboard

InexactError when training "LeNet" on 1d image data

Mocha.jl
Mocha.jl copied to clipboard