neuraltalk2
neuraltalk2 copied to clipboard
Error when training started from another model
I have tried to initialize the model with a trained model.
However, when I set the 'start_from' to the path to the trained model, I got an error as shown below.
initializing weights from checkpoint_path/model_id.t7
...jian/torch/install/bin/luajit: torch/install/share/lua/5.1/nn/Module.lua:297: misaligned parameter at 2
stack traceback:
[C]: in function 'assert'
...jian/torch/install/share/lua/5.1/nn/Module.lua:297: in function 'getParameters'
train.lua:158: in main chunk
[C]: in function 'dofile'
...jian/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670
It seems the storageOffset of parameters and gradParameters are not same because of the net_utils.unsanitize_gradients and net_utils.sanitize_gradients.
I met the same questions and I haven't understand why. I think the trained model may exchanged the parameters or gradparameters in a wrong way when training so it doesn't work.
I use this method:
params = {} for i = 1,21 do params[i] = model:get(i):getParameters() end torch.save(params[i])
to get the params in every layer(and 21 is the layer in my model). and then use these params to value a new model i create.