tensorboardX
tensorboardX copied to clipboard
Variable slice/index assignment graph breaking
I have been facing an issue when trying to create a graph of a Module in which some Variables have slice assignment operations in them. I have reduced the problem to the following example, ignore the commented out V = x
for now.
import torch
from torch.autograd import Variable
from tensorboardX import SummaryWriter
class DummyModule(torch.nn.Module):
def forward(self, x):
V = Variable(torch.Tensor(2, 2))
V[0, 0] = x
# V = x
return torch.sum(V * 3)
x = Variable(torch.Tensor([1]), requires_grad=True)
r = DummyModule()(x)
r.backward()
print(x.grad)
w = SummaryWriter()
x = Variable(torch.Tensor([1]), requires_grad=True)
w.add_graph(DummyModule(), x, verbose=True)
The output from this is below, showing that the gradients are flowing all right, but the graph is not being connected. If I insert another input Variable and other operations in the Module, add_graph()
works fine without throwing an error, but the graph show a disconnected input for x
, so I suppose the nature of this error is that the only input Variable available is being interpreted as disconnected.
Variable containing:
3
[torch.FloatTensor of size (1,)]
Traceback (most recent call last):
File "test_grad.py", line 21, in <module>
w.add_graph(DummyModule(), x, verbose=True)
File "/Users/filiped/anaconda/envs/pytorch0.4/lib/python3.6/site-packages/tensorboardX/writer.py", line 400, in add_graph
self.file_writer.add_graph(graph(model, input_to_model, verbose))
File "/Users/filiped/anaconda/envs/pytorch0.4/lib/python3.6/site-packages/tensorboardX/graph.py", line 44, in graph
trace, _ = torch.jit.trace(model, args)
File "/Users/filiped/anaconda/envs/pytorch0.4/lib/python3.6/site-packages/torch/jit/__init__.py", line 251, in trace
return TracedModule(f, nderivs=nderivs)(*args, **kwargs)
File "/Users/filiped/anaconda/envs/pytorch0.4/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "/Users/filiped/anaconda/envs/pytorch0.4/lib/python3.6/site-packages/torch/jit/__init__.py", line 287, in forward
torch._C._tracer_exit(out_vars)
RuntimeError: /Users/filiped/pytorch/torch/csrc/jit/tracer.h:117: getTracingState: Assertion `state` failed.
Moreover, if you uncomment the line V = x
and comment the line above it, so that no slice/index assign operation is performed, you get, as expected:
Variable containing:
3
[torch.FloatTensor of size (1,)]
graph(%0 : Float(1)) {
%1 : UNKNOWN_TYPE = Constant[value={3}](), scope: DummyModule
%2 : Float(1) = Mul[broadcast=1](%0, %1), scope: DummyModule
%3 : Float() = Sum(%2), scope: DummyModule
return (%3);
}
This was all executed in Pytorch 0.4
(Edits: Did a couple rounds of re-simplifying the example.)
In Pytorch 0.3.1 this does not seem to be a problem, with the output being:
Variable containing:
3
[torch.FloatTensor of size 1]
/Users/filiped/anaconda/lib/python3.6/site-packages/torch/onnx/__init__.py:244: UserWarning: ONNX export failed on Constant because torch.onnx.symbolic.Constant does not exist
.format(op_name, op_name))
/Users/filiped/anaconda/lib/python3.6/site-packages/torch/onnx/__init__.py:244: UserWarning: ONNX export failed on sum because torch.onnx.symbolic.sum does not exist
.format(op_name, op_name))
graph(%1 : Float(1)) {
%2 : Float(2, 2) = Constant[value= 1.0000e+00 -4.6566e-10 -6.4371e+05 1.0845e-19 [ CPUFloatTensor{2,2} ]](), uses = [%3.i0], scope: DummyModule;
%4 : Float(2, 2), %5 : Handle = ^SetItem((0, 0))(%2, %1), uses = [[%7.i0], []], scope: DummyModule;
%6 : UNKNOWN_TYPE = Constant[value={3}](), uses = [%7.i1], scope: DummyModule;
%7 : Float(2, 2) = Mul[broadcast=1](%4, %6), uses = [%8.i0], scope: DummyModule;
%8 : Float() = sum(%7), uses = [%0.i0], scope: DummyModule;
return (%8);
}
Thanks for the report. The code path (of tensorboardX) for v0.3 and v0.4 are different. As for code you used, v0.3 or v0.3.1 should use onnx export as a buffer to add the graph, while in v0.4, tensorboardX export graph much directly.
I just merged a patch #83 so that tensorboardX have similar behavior for v0.3.1 and v0.4. I think (this patch +pytorch v0.3.1) should also fail on your code.
I will inspect this once the CI test is passed.
@lanpa the results in Pytorch 0.3.1 that I reported above were already with the patched pulled in
@filipeabperes Do you mean 24a0d77?
@lanpa Yeah, I just re-pulled from the repo and ran the example to test, with same results as above
I replaced w.add_graph(DummyModule(), x, verbose=True)
with torch.onnx.export(DummyModule(), x, "./IndexLayer.pb", verbose=True)
In pytorch v0.4, I got a same error message.
...
getTracingState: Assertion `state` failed.
In pytorch v0.3.1, it becomes:
Traceback (most recent call last):
File "fili.py", line 22, in <module>
torch.onnx.export(DummyModule(), x, "./IndexLayer.pb", verbose=True)
File "/Users/dexter/anaconda3/lib/python3.6/site-packages/torch/onnx/__init__.py", line 75, in export
_export(model, args, f, export_params, verbose, training)
File "/Users/dexter/anaconda3/lib/python3.6/site-packages/torch/onnx/__init__.py", line 131, in _export
proto = trace.export(list(model.state_dict().values()), _onnx_opset_version)
RuntimeError: ONNX export failed: Couldn't export Python operator SetItem
Graph we tried to export:
graph(%1 : Float(1)) {
%2 : Float(2, 2) = Constant[value= 1.0000e+00 -2.5244e-29 4.5598e+20 -1.0845e-19 [ CPUFloatTensor{2,2} ]](), uses = [%3.i0], scope: DummyModule;
%4 : Float(2, 2), %5 : Handle = ^SetItem((0, 0))(%2, %1), uses = [[%7.i0], []], scope: DummyModule;
%6 : UNKNOWN_TYPE = Constant[value={3}](), uses = [%7.i1], scope: DummyModule;
%7 : Float(2, 2) = Mul[broadcast=1](%4, %6), uses = [%8.i0], scope: DummyModule;
%8 : Float() = sum(%7), uses = [%0.i0], scope: DummyModule;
return (%8);
}
Look like there is still problem in v0.3.1. I think this bug needs to be reported to onnx developers.
Maybe related, I have noticed a similar issue in 0.3.1 when using slices. Example below:
import torch
from torch.autograd import Variable
from tensorboardX import SummaryWriter
class DummyModule(torch.nn.Module):
def forward(self, x):
V = Variable(torch.Tensor(2, 2))
V[0, 0] = x[0:1]
# V[0, 0] = x[0]
return torch.sum(V * 3)
x = Variable(torch.Tensor([1, 1, 1]), requires_grad=True)
r = DummyModule()(x)
r.backward()
print(x.grad)
w = SummaryWriter()
x = Variable(torch.Tensor([1, 1, 1]), requires_grad=True)
w.add_graph(DummyModule(), x, verbose=True)
Which gives the following output. As before, switching the commented lines removes the problem, so it seems to be particular to slicing.
Variable containing:
3
0
0
[torch.FloatTensor of size 3]
Traceback (most recent call last):
File "slice_grad.py", line 22, in <module>
w.add_graph(DummyModule(), x, verbose=True)
File "/Users/filiped/anaconda/lib/python3.6/site-packages/tensorboardX-1.0-py3.6.egg/tensorboardX/writer.py", line 400, in add_graph
self.file_writer.add_graph(graph(model, input_to_model, verbose))
File "/Users/filiped/anaconda/lib/python3.6/site-packages/tensorboardX-1.0-py3.6.egg/tensorboardX/graph.py", line 54, in graph
torch.onnx._optimize_trace(trace)
File "/Users/filiped/anaconda/lib/python3.6/site-packages/torch/onnx/__init__.py", line 81, in _optimize_trace
torch._C._jit_pass_onnx(trace)
File "/Users/filiped/anaconda/lib/python3.6/site-packages/torch/onnx/__init__.py", line 148, in _run_symbolic_method
return symbolic_fn(*args)
File "/Users/filiped/anaconda/lib/python3.6/site-packages/torch/autograd/_functions/tensor.py", line 77, in symbolic
raise ValueError('Unsupported index type {}'.format(type(index)))
ValueError: Unsupported index type <class 'slice'>
Looks like this PR implements slice. https://github.com/pytorch/pytorch/pull/5204
That was only merged into 0.4, right? I just tested and it doesn't seem to fix the 0.4 problem.
Tested with 0.4.0a0+063946d and still the same.
I modified the module to:
class DummyModule(torch.nn.Module):
def __init__(self):
super(DummyModule, self).__init__()
self.V = torch.nn.Parameter(torch.Tensor(2, 2))
def forward(self, x):
self.V[0, 0] = x
return torch.sum(self.V)
but now r.backward() triggers: File "/home/dexter/anaconda3/lib/python3.6/site-packages/torch/autograd/init.py", line 81, in backward variables, grad_variables, retain_graph, create_graph) RuntimeError: leaf variable has been moved into the graph interior
I got a similar issue without tensorboardX and torch.nn.Parameter
.
I simply use a torch.Tensor
(dtype=float64) and try to set some values in it. I even tried to use the scatter_
function but it did not work either.
My code is basically:
import torch
# initialize tensor
tensor = torch.zeros((1, 400, 400)).double()
tensor.requires_grad_(True)
# create index ranges
x_range = torch.arange(150, 250).double()
x_range.requires_grad_(True)
y_range = torch.arange(150, 250).double()
y_range.requires_grad_(True)
# get indices of flattened tensor
x_range = x_range.long().repeat(100, 1)
y_range = y_range.long().repeat(100, 1)
y_range = y_range.t()
tensor_size = tensor.size()
indices = y_range.sub(1).mul(tensor_size[2]).add(x_range).view((1, -1))
# create patch
patch = torch.ones((1, 100, 100)).double()
# flatten tensor
tensor_flattened = tensor.contiguous().view((1, -1))
# set patch to cells of tensor_flattend at indices and reshape tensor
tensor_flattened.scatter_(1, indices, patch.view(1, -1))
tensor = tensor_flattened.view(tensor_size)
# sum up for scalar output for calling backward()
tensor_sum = tensor.sum()
# calling backward()
tensor_sum.backward()
# alternative to avoid summing tensor:
tensor.backward(torch.ones_like(tensor))
seems like this issue is not caused by tensorboardX
update: still not working in pytorch 0.4 release + tensorboardX master. output of tensorboardX:
Error occurs, No graph saved
Checking if it's onnx problem...
Your model fails onnx too, please report to onnx team
Has someone reported this to ONNX team? I'm currently busy now but could do it in a couple weeks if not.
I'm having the same error converting numpy mdim array to tensor in the def forward
@sgarcia22 How can numpy array cause this error?
This issue is solved in new onnx. closing this.
Is this really fixed? By running the following code, I get
import torch
from tensorboardX import SummaryWriter
class DummyModule(torch.nn.Module):
def forward(self, x):
V = torch.zeros(2, 2)
V[0, 0] = x
# V = x
return torch.sum(V * 3)
x = torch.tensor([1.0], requires_grad=True)
r = DummyModule()(x)
r.backward()
print(x.grad)
w = SummaryWriter()
x = torch.tensor([1.0], requires_grad=True)
w.add_graph(DummyModule(), x, verbose=True)
tensor([3.])
graph(%0 : Float(1)) {
%1 : Float() = onnx::Constant[value={0}]()
return (%1);
}
And again with the V = x
uncommented I get
tensor([3.])
graph(%0 : Float(1)) {
%1 : Tensor = onnx::Constant[value={3}](), scope: DummyModule
%2 : Float(1) = onnx::Mul(%0, %1), scope: DummyModule
%3 : Float() = onnx::ReduceSum[keepdims=0](%2), scope: DummyModule
return (%3);
}
Interesting, I closed this because the onnx error is disappeared. Reopen it.