neuraltalk2 icon indicating copy to clipboard operation
neuraltalk2 copied to clipboard

Error when training started from another model

Open wujian752 opened this issue 8 years ago • 2 comments

I have tried to initialize the model with a trained model.

However, when I set the 'start_from' to the path to the trained model, I got an error as shown below.

initializing weights from checkpoint_path/model_id.t7
...jian/torch/install/bin/luajit: torch/install/share/lua/5.1/nn/Module.lua:297: misaligned parameter at 2
stack traceback:
        [C]: in function 'assert'
        ...jian/torch/install/share/lua/5.1/nn/Module.lua:297: in function 'getParameters'
        train.lua:158: in main chunk
        [C]: in function 'dofile'
        ...jian/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
        [C]: at 0x00406670

It seems the storageOffset of parameters and gradParameters are not same because of the net_utils.unsanitize_gradients and net_utils.sanitize_gradients.

wujian752 avatar Nov 25 '16 17:11 wujian752

I met the same questions and I haven't understand why. I think the trained model may exchanged the parameters or gradparameters in a wrong way when training so it doesn't work.

jimie208 avatar Dec 22 '16 01:12 jimie208

I use this method:

params = {} for i = 1,21 do params[i] = model:get(i):getParameters() end torch.save(params[i])

to get the params in every layer(and 21 is the layer in my model). and then use these params to value a new model i create.

jimie208 avatar Dec 22 '16 02:12 jimie208