pytorch-yolo-v3 icon indicating copy to clipboard operation
pytorch-yolo-v3 copied to clipboard

Error when trying to convert darknet.py to ONNX

Open fernandojunior opened this issue 6 years ago • 12 comments

Hi!

I'm trying to create a script to convert the darknet.py Pytorch model to ONNX model using CPU based on this example:

# onnx-convert.py

from darknet import Darknet

import torch 
import torch.onnx
from torch.autograd import Variable
import torchvision

cfgfile = './cfg/yolov3.cfg'
weightfile =  './yolov3.weights'
imgfile = './imgs/dog.jpg'
resolution = 416

model = Darknet(cfgfile)
model.load_weights(weightfile)

dummy_input = Variable(torch.randn(1, 3, resolution, resolution))
torch.onnx.export(model, dummy_input, "yolo.onnx")

But when I run python onnx-convert.py, the following error happens:

TypeError: forward() missing 1 required positional argument: 'CUDA'

I changed the Darknet#forward(self, x, CUDA) member signature to Darknet#forward(self, x, CUDA = False). Now the following error happens:

RuntimeError: invalid argument 2: size '[1 x 255 x 2809]' is invalid for input with 689520 elements at /pytorch/aten/src/TH/THStorage.c:41

I changed my script to include following condition:

from darknet import Darknet

...
model = Darknet(cfgfile)
model.net_info["height"] = resolution
...
torch.onnx.export(model, dummy_input, "yolo.onnx")

The following error happens: RuntimeError: /pytorch/torch/csrc/jit/tracer.h:120: getTracingState: Assertion state failed.

Someone can help me?

fernandojunior avatar Jul 04 '18 14:07 fernandojunior

I have this same exact issue I've been trying to solve for days. I even went through every necessary file and set CUDA to False or just removed a CUDA variable where possible. I started trying to mess with the PyTorch files themselves but still had no luck. I really hope someone (or even @ayooshkathuria ) is able to find a way to complete this conversion from PyTorch to ONNX or help us debug this error. I, personally, can't get past the

TypeError: forward() missing 1 required positional argument: 'CUDA'

EDIT: @fernandojunior How did you define the resolution?

momenabdelkarim avatar Jul 05 '18 21:07 momenabdelkarim

Hey @momenabdelkarim

I just changed the line 307 of darknet.py module from def forward(self, x, CUDA) to def forward(self, x, CUDA=False).

fernandojunior avatar Jul 06 '18 13:07 fernandojunior

@fernandojunior

I mean how is resolution defined in this part of your code:

from darknet import Darknet
...
model = Darknet(cfgfile)
model.net_info["height"] = resolution
...
torch.onnx.export(model, dummy_input, "yolo.onnx")

momenabdelkarim avatar Jul 06 '18 19:07 momenabdelkarim

Before:

# onnx-convert.py

from darknet import Darknet

import torch 
import torch.onnx
from torch.autograd import Variable
import torchvision

cfgfile = './cfg/yolov3.cfg'
weightfile =  './yolov3.weights'
imgfile = './imgs/dog.jpg'
resolution = 416

model = Darknet(cfgfile)
model.load_weights(weightfile)

dummy_input = Variable(torch.randn(1, 3, resolution, resolution))
torch.onnx.export(model, dummy_input, "yolo.onnx")

After:

# onnx-convert.py

from darknet import Darknet

import torch 
import torch.onnx
from torch.autograd import Variable
import torchvision

cfgfile = './cfg/yolov3.cfg'
weightfile =  './yolov3.weights'
imgfile = './imgs/dog.jpg'
resolution = 416

model = Darknet(cfgfile)
model.load_weights(weightfile)
model.net_info["height"] = resolution # >>> Added line

dummy_input = Variable(torch.randn(1, 3, resolution, resolution))
torch.onnx.export(model, dummy_input, "yolo.onnx")

Error after change: RuntimeError: /pytorch/torch/csrc/jit/tracer.h:120: getTracingState: Assertion state failed.

fernandojunior avatar Jul 10 '18 19:07 fernandojunior

ONNX might not support an operation being used, tried the same with another pytorch version (https://github.com/marvis/pytorch-yolo3) and got an Assertion state failed as well

sgarcia22 avatar Jul 11 '18 23:07 sgarcia22

I have also been struggling with this issue, so I followed his tutorial and re-built the model step-by-step seeing when it broke.

It broke here:

# darknet.py lines ~364
         elif module_type == 'yolo':        
                
                anchors = self.module_list[i][0].anchors
                #Get the input dimensions
                inp_dim = int (self.net_info["height"])
                
                #Get the number of classes
                num_classes = int (modules[i]["classes"])
                
                #Output the result
                x = x.data
                x = predict_transform(x, inp_dim, anchors, num_classes, CUDA)

Looking at the torch.Tensor source code, since data isn't a property on the docs, I found this:

    def __setstate__(self, state):
        if not self.is_leaf:
            raise RuntimeError('__setstate__ can be only called on leaf Tensors')
        if len(state) == 4:
            # legacy serialization of Tensor
            self.set_(*state)
            return
        elif len(state) == 5:
            # legacy serialization of Variable
            self.data = state[0]
            state = (state[3], state[4], state[2])
        self.requires_grad, _, self._backward_hooks = state

So if you comment out x = x.data then the code exports to ONNX.

NO IDEA if it messes it up or not, haven't tested that yet.

HtH

sfurlani avatar Jul 13 '18 20:07 sfurlani

@sfurlani Great, it worked for me, but when I test the ONNX network using a visualizer it has no Yolo layers anymore. Could that be whats broken?

momenabdelkarim avatar Jul 16 '18 01:07 momenabdelkarim

@momenabdelkarim the 'Yolo' layer is defined in code as the DetectionLayer in darknet.py and it calls from util.py:

def predict_transform(prediction, inp_dim, anchors, num_classes, CUDA = True):

When the ONNX encoder's JIT is processing the network, the output is this bit here: (I'm using yolov3-tiny so your numbers might be different):

# Yolo
48/50: Converting Node Type Reshape
49/50: Converting Node Type Transpose
50/50: Converting Node Type Reshape

What's missing from predict_transform() is the sigmoid functions, the anchors, and offsets. Basically the whole thing...

After a brief look at the PyTorch ONNX exporter, I don't think it will accurately export these steps without a massive amount of work.

Good luck, HtH

sfurlani avatar Jul 16 '18 14:07 sfurlani

Hi there! Anyone had any success working with exporting this model? I'm wondering if it's possible right now...

ThatAIGeek avatar Aug 16 '18 15:08 ThatAIGeek

https://github.com/YunYang1994/tensorflow-yolov3

YunYang1994 avatar Dec 02 '18 16:12 YunYang1994

@fernandojunior @momenabdelkarim @sgarcia22 @sfurlani @ThatAIGeek @ this YOLOv3 tutorial may help: https://github.com/ultralytics/yolov3/wiki/Train-Custom-Data

The accompanying repository performs ONNX export correctly, works on MacOS, Windows and Linux, includes multigpu and multithreading, performs inference on images, videos, webcams, and an iOS app. It also tests to slightly higher mAPs than darknet, including on the latest YOLOv3-SPP.weights (60.7 COCO mAP), and offers the ability to train custom datasets from scratch to darknet performance, all using PyTorch :) https://github.com/ultralytics/yolov3



fourth-archive avatar Apr 11 '19 17:04 fourth-archive

https://github.com/Rapternmn/PyTorch-Onnx-Tensorrt

Rapternmn avatar Dec 15 '19 09:12 Rapternmn