torch-autograd icon indicating copy to clipboard operation
torch-autograd copied to clipboard

Help with using autograd in training with wrapped NN modules

Open synchro-- opened this issue 8 years ago • 4 comments

Let's say I have a whole wrapped network made with nn called 'model' and I used the

modelFunction, params = autograd.functionalize(model)
neuralNet = function(params, input, target) ...  return myCustomLoss end   
df = autograd(neuralNet)

Now I want to train my model. Since I have my training typical procedure (with the mini-batch closure) already written and ready, I would like to keep most of it, only exploiting the very easy way to get the gradients with autograd. So let's confront the 2 methods, so that you can tell if that is actually possible.

The usual:

local feval = function(x)
  if x ~= parameters then
     parameters:copy(x)
  end
  -- reset gradients
  gradParameters:zero()

  -- f is the average of all criterions
  local f = 0

  -- evaluate function for complete mini batch
  for i = 1,#inputs do
     -- estimate f
     local output = model:forward(inputs[i])
     local err = criterion:forward(output, targets[i])
     f = f + err

     -- estimate df/dW
     local df_do = criterion:backward(output, targets[i])
     model:backward(inputs[i], df_do)
  end
  -- normalize gradients and f(X)
  gradParameters:div(#inputs)
  f = f/#inputs

  -- return f and df/dX
  return f,gradParameters
  end

So, using autograd while making the smallest changes possible it would be:

-- create closure to evaluate f(X) and df/dX
local feval = function(x)
   -- get new parameters
   if x ~= parameters then
      parameters:copy(x)
   end
   -- reset gradients
   gradParameters:zero()

   -- f is the average of all criterions
   local f = 0

   -- evaluate function for complete mini batch
   for i = 1,#inputs do
      -- estimate f
      local df_do, err, output = df(params,inputs[i],targets[i])
      f = f + err
      model:backward(inputs[i], df_do)
   end

   -- normalize gradients and f(X)
   gradParameters:div(#inputs)
   f = f/#inputs
   -- return f and df/dX
   return f,gradParameters
end

And then going on with using the optim module in the classical way. Is this not possible/not suggested?

synchro-- avatar Jan 04 '17 05:01 synchro--

@synchro-- were you successful in doing this? I am mixing optim with wrapped nn modules and getting the following errors:

/Graph.lua:40: bad argument #2 to 'fn' (expecting number or torch.DoubleTensor or torch.DoubleStorage at /tmp/luarocks_torch-scm-1-9261/torch7/generic/Tensor.c:1125)

sebastiangonsal avatar Mar 23 '17 10:03 sebastiangonsal

I have actually never tried in the end. I dropped the project because I was working on something else, but it's something I could try in the next weeks let's say. Keep me up to date if you manage to use optim that way.

On Thu, Mar 23, 2017, 11:06 sebastiangonsal [email protected] wrote:

@synchro-- https://github.com/synchro-- were you successful in doing this? I am mixing optim with wrapped nn modules and getting the following errors:

/Graph.lua:40: bad argument #2 to 'fn' (expecting number or torch.DoubleTensor or torch.DoubleStorage at /tmp/luarocks_torch-scm-1-9261/torch7/generic/Tensor.c:1125)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/twitter/torch-autograd/issues/169#issuecomment-288672004, or mute the thread https://github.com/notifications/unsubscribe-auth/AFmzun9DRyVOLkOXVLWNrPtXGd4Llmw6ks5rokQmgaJpZM4LaS2X .

synchro-- avatar Mar 23 '17 13:03 synchro--

I can confirm that it is possible to mix optim with wrapped nn modules. Errors that you hit might be features that autograd does not support.

biggerlambda avatar Mar 28 '17 00:03 biggerlambda

@biggerlambda Thanks. Do you have any example of that?

synchro-- avatar Apr 19 '17 16:04 synchro--