optimize-net icon indicating copy to clipboard operation
optimize-net copied to clipboard

graphgen doesn't work with DataParallelTable

Open szagoruyko opened this issue 9 years ago • 2 comments

repro:

require 'cunn'
require 'cudnn'
local generateGraph = require 'optnet.graphgen'
local iterm = require 'iterm'

local model = nn.DataParallelTable(1)

model:add(cudnn.SpatialConvolution(3,96,7,7,3,3),1)
model:add(cudnn.SpatialConvolution(3,96,7,7,3,3),2)

model:cuda()

local input = torch.randn(32,3,224,224):cuda()

iterm.dot(generateGraph(model, input))

gives

/opt/rocks/distro/install/bin/luajit: /opt/rocks/distro/install/share/lua/5.1/torch/File.lua:141: Unwritable object <userdata> at <?>.<?>.updateOutput.basefunc.errcheck.C
stack traceback:
    [C]: in function 'error'
    /opt/rocks/distro/install/share/lua/5.1/torch/File.lua:141: in function 'writeObject'
    /opt/rocks/distro/install/share/lua/5.1/torch/File.lua:235: in function 'writeObject'
    /opt/rocks/distro/install/share/lua/5.1/torch/File.lua:235: in function 'writeObject'
    /opt/rocks/distro/install/share/lua/5.1/torch/File.lua:200: in function 'writeObject'
    /opt/rocks/distro/install/share/lua/5.1/torch/File.lua:235: in function 'writeObject'
    /opt/rocks/distro/install/share/lua/5.1/torch/File.lua:235: in function 'writeObject'
    /opt/rocks/distro/install/share/lua/5.1/torch/File.lua:200: in function 'writeObject'
    /opt/rocks/distro/install/share/lua/5.1/torch/File.lua:235: in function 'writeObject'
    /opt/rocks/distro/install/share/lua/5.1/torch/File.lua:235: in function 'writeObject'
    /opt/rocks/distro/install/share/lua/5.1/torch/File.lua:200: in function 'writeObject'
    /opt/rocks/distro/install/share/lua/5.1/torch/File.lua:235: in function 'writeObject'
    ...istro/install/share/lua/5.1/cudnn/SpatialConvolution.lua:470: in function 'write'
    /opt/rocks/distro/install/share/lua/5.1/torch/File.lua:210: in function 'writeObject'
    /opt/rocks/distro/install/share/lua/5.1/nn/Module.lua:107: in function 'clone'
    .../distro/install/share/lua/5.1/cunn/DataParallelTable.lua:634: in function 'applyChanges'
    .../distro/install/share/lua/5.1/cunn/DataParallelTable.lua:472: in function 'apply'
    /opt/rocks/distro/install/share/lua/5.1/optnet/graphgen.lua:221: in function 'generateGraph'
    /tmp/graphgen_fail.lua:15: in main chunk

szagoruyko avatar Apr 28 '16 14:04 szagoruyko

Thanks for the example Sergey ! I managed to reduce the problem to the following snippet (independent of graphgen or cudnn):

require 'cunn'
model = nn.DataParallelTable(1)
model:add(nn.SpatialConvolution(3,96,7,7,3,3),1)
model:add(nn.SpatialConvolution(3,96,7,7,3,3),2)
model:cuda()
input = torch.randn(32,3,224,224):cuda()
function f(m)
  local ff = m.updateOutput
  m.updateOutput = function(self, i)
    return ff(self, i)
  end
end
model:apply(f)
model:forward(input);

This behaviour is not compatible with the other modules, where everything work as expected. This seems like a bug in nn.DataParallelTable, or am I missing something ?

fmassa avatar Apr 28 '16 18:04 fmassa

@szagoruyko I proposed a quick fix for this issue in https://github.com/fmassa/optimize-net/commit/0c7c216b81ae9b020d71b597afb2c0997b332a15 . The test snippet you sent works. Could you check if it works for your models ?

fmassa avatar Apr 29 '16 10:04 fmassa