Mocha.jl
Mocha.jl copied to clipboard
InexactError when training "LeNet" on 1d image data
I am new to Mocha, and I am trying to modify the LeNet tutorial for my 1d image dataset, basically what I do is to slightly change the kernel size, and stride size as follows:
data_layer = AsyncHDF5DataLayer(name="data", source="data/train.txt", batch_size=64, shuffle=true)
conv_layer = ConvolutionLayer(name="conv1", n_filter=20, kernel=(5,1), bottoms=[:data], tops=[:conv])
pool_layer = PoolingLayer(name="pool1", kernel=(2,1), stride=(2,1), bottoms=[:conv], tops=[:pool])
conv2_layer = ConvolutionLayer(name="conv2", n_filter=50, kernel=(5,1), bottoms=[:pool], tops=[:conv2])
pool2_layer = PoolingLayer(name="pool2", kernel=(2,1), stride=(2,1), bottoms=[:conv2], tops=[:pool2])
fc1_layer = InnerProductLayer(name="ip1", output_dim=500, neuron=Neurons.ReLU(), bottoms=[:pool2], tops=[:ip1])
fc2_layer = InnerProductLayer(name="ip2", output_dim=2, bottoms=[:ip1], tops=[:ip2])
loss_layer = SoftmaxLossLayer(name="loss", bottoms=[:ip2,:label])
After the network is constructed, I get following error message:
04-Apr 23:17:53:INFO:root:## Performance on Validation Set after 0 iterations
04-Apr 23:17:53:INFO:root:---------------------------------------------------------
04-Apr 23:17:53:INFO:root: Accuracy (avg over 15300) = 93.8627%
04-Apr 23:17:53:INFO:root:---------------------------------------------------------
04-Apr 23:17:53:INFO:root:
04-Apr 23:17:54:DEBUG:root:#DEBUG Entering solver loop
ERROR: LoadError: InexactError()
in max_pooling_forward at /Users/cinvro/.julia/v0.4/Mocha/src/layers/pooling/julia-impl.jl:34
in forward at /Users/cinvro/.julia/v0.4/Mocha/src/layers/pooling.jl:93
in forward at /Users/cinvro/.julia/v0.4/Mocha/src/layers/pooling.jl:84
in forward at /Users/cinvro/.julia/v0.4/Mocha/src/net.jl:148
in onestep_solve at /Users/cinvro/.julia/v0.4/Mocha/src/solvers.jl:222
in do_solve_loop at /Users/cinvro/.julia/v0.4/Mocha/src/solvers.jl:242
in solve at /Users/cinvro/.julia/v0.4/Mocha/src/solvers.jl:235
in include at /Applications/Julia-0.4.3.app/Contents/Resources/julia/lib/julia/sys.dylib
in include_from_node1 at /Applications/Julia-0.4.3.app/Contents/Resources/julia/lib/julia/sys.dylib
in process_options at /Applications/Julia-0.4.3.app/Contents/Resources/julia/lib/julia/sys.dylib
in _start at /Applications/Julia-0.4.3.app/Contents/Resources/julia/lib/julia/sys.dylib
Any idea why this happens?
My net looks like this:
The line of code reporting InexactError
is this line: https://github.com/pluskid/Mocha.jl/blob/master/src/layers/pooling/julia-impl.jl#L34
It is trying to assign a value to the mask, which is unsigned. If you try to assign an invalid value (e.g. a negative value), an InexactError
will occur. My guessing was that the pooling range somehow goes out of range, making some negative value there. But looking at the visualization you pasted above, it seems perfectly valid. Can you maybe try to insert a print statement
println((maxh-1) * width + maxw-1)
right before that line to see what value we got that caused the error?
@pluskid you are right, I got -180, where maxh=0
, maxw=0
and width=179
.
What does that mean? Is that a problem of my data or a bug?
It seems like some pooling region is empty. Just as a sanity check, can you change the kernel for the pooling layer from (2,1)
to larger values like (3,1)
to see if it runs? Thanks!
Thank you for the reply.
Yes. I got following error after changed the kernel size of pooling layer from (2,1)
to (3,1)
.
ERROR: LoadError: AssertionError: is_similar_shape(params[j],net.states[i].parameters[j].blob)
in load_network at /Users/cinvro/.julia/v0.4/Mocha/src/utils/io.jl:102
in anonymous at /Users/cinvro/.julia/v0.4/Mocha/src/solvers.jl:158
in jldopen at /Users/cinvro/.julia/v0.4/JLD/src/JLD.jl:245
in load_snapshot at /Users/cinvro/.julia/v0.4/Mocha/src/solvers.jl:157
in init_solve at /Users/cinvro/.julia/v0.4/Mocha/src/solvers.jl:184
in solve at /Users/cinvro/.julia/v0.4/Mocha/src/solvers.jl:234
in include at /Applications/Julia-0.4.3.app/Contents/Resources/julia/lib/julia/sys.dylib
in include_from_node1 at /Applications/Julia-0.4.3.app/Contents/Resources/julia/lib/julia/sys.dylib
in process_options at /Applications/Julia-0.4.3.app/Contents/Resources/julia/lib/julia/sys.dylib
in _start at /Applications/Julia-0.4.3.app/Contents/Resources/julia/lib/julia/sys.dylib
@cinvro That is due to previously saved snapshots. Can you remove the saved snapshot files and re-try again? Thanks!
@pluskid oh, I didn't realize that.
Now I get -179, where maxh=0
, maxw=0
and width=178
.
@cinvro I checked the code and did not find the bug. It seems the pooling loop is not executed (otherwise maxh
and maxw
should not be zero). Can you at the same place print the values for hstart
, hend
, wstart
, wend
as well as val
, maxval
? On potential problem is that your matrix contains NaN
. In this case, NaN > -Inf
is false, so the pooling is unsuccessful.
@pluskid I got hstart=1
,hend=1
,wstart=89
,wend=90
and maxval=-Inf
.
I cannot print out val
because it says val
is undefined, which is very strange.
However, I can print out val
inside the for loop, which gives me val = -Inf
in this case.
I can reproduce this error when I do not set the neuron property on the convolutional layer. It took me a while to narrow it down, but once I set neuron=Neurons.ReLU()
on the convolutional layer the InexactError (NaN value for maxval
in function max_pooling_forward
) went away.
I see that the code posted here also doesn't have a neuron defined on the convolutional layer, so I suspect the same is the case here.