vision icon indicating copy to clipboard operation
vision copied to clipboard

C++ Inferencing using Torchscript Exported Torchvision model Erorr

Open Yasin40 opened this issue 2 years ago • 21 comments

🐛** C++ Inferencing using Torchscript Exported Torchvision model Erorr

I'm trying to use this approach to make my model (Mobilenetv3 small) using Torchvison models, In train and validation phase (python) worked Whiteout any problem but after saving Torchscript to use in c++ inference, got this error:

terminate` called after throwing an instance of 'torch::jit::ErrorReport'
  what():  
Unknown type name 'NoneType':
Serialized   File "code/__torch__/torch/nn/modules/linear.py", line 6
  training : bool
  _is_full_backward_hook : Optional[bool]
  def forward(self: __torch__.torch.nn.modules.linear.Identity) -> NoneType:
                                                                   ~~~~~~~~ <--- HERE
    return None
class Linear(Module):

Aborted (core dumped)

My simplified Torchscript exporting code:

import sys
import time
from pathlib import Path

import torch
import torch.nn as nn
from model import initialize_model,BSConv_init
num_classes=14
device = torch.device('cpu')
model = models.mobilenet_v3_small(pretrained=use_pretrained)
num_ftrs = model.classifier[3].in_features
model.classifier[3] = nn.Linear(num_ftrs, num_classes)
model = model.to(device)
checkpoint = torch.load('checkpoint/best_model_MobBsconv_ckpt.t7', map_location=device)  
model.load_state_dict(checkpoint['model'])

# Input
img = torch.rand(1, 3, 224, 224).to(device)
model.eval()
ts = torch.jit.trace(model, img, strict=False)
ts.save("traced_mob_bsconv_model.pt")

this exporting script run successfully, but using c++ produce error. this is my simpilified C++ code that works for other models:

try{ this->module = torch::jit::load(ModelAddress); }catch (const c10::Error& e) { std::cerr << "error loading the model: " << e.what() << std::endl; std::exit(EXIT_FAILURE); } half_ = (device_ != torch::kCPU); this->module.to(device_); if (half_) { module.to(torch::kHalf); } torch::NoGradGuard no_grad; module.eval(); Even got error until this initializing, but my other exported models work fine at forward and ... . I'm confused and need help.

Environment

env 1: System which trained and export torchscript (by above code): OS: Ubuntu 18.04.5 LTS (x86_64) GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: 6.0.0-1ubuntu2 (tags/RELEASE_600/final) CMake version: version 3.10.2 Libc version: glibc-2.25

Python version: 3.6.9 (default, Jul 17 2020, 12:50:27) [GCC 8.4.0] (64-bit runtime) Python platform: Linux-5.4.0-48-generic-x86_64-with-Ubuntu-18.04-bionic Is CUDA available: True CUDA runtime version: Could not collect GPU models and configuration: GPU 0: GeForce GTX 1060 6GB Nvidia driver version: 450.66 cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5 HIP runtime version: N/A MIOpen runtime version: N/A

Versions of relevant libraries: [pip3] numpy==1.19.4 [pip3] torch==1.9.0 [pip3] torchvision==0.10.0

env 2: system which run c++ code and got error:

OS: Ubuntu 18.04.4 LTS (x86_64) GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: Could not collect CMake version: version 3.10.2 Libc version: glibc-2.15

Python version: 2.7.17 (default, Jul 20 2020, 15:37:01) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-5.4.0-42-generic-x86_64-with-Ubuntu-18.04-bionic Is CUDA available: N/A CUDA runtime version: Could not collect GPU models and configuration: Could not collect Nvidia driver version: Could not collect cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A

Versions of relevant libraries: [pip3] numpy==1.18.1

Additional context

Yasin40 avatar Aug 12 '21 17:08 Yasin40

Hi,

Was the version of PyTorch and torchvision used to run on the second environment the same as in the machine used to export the model?

fmassa avatar Aug 13 '21 12:08 fmassa

Hi,

Was the version of PyTorch and torchvision used to run on the second environment the same as in the machine used to export the model?

Thanks for attention. In second environment i use model to c++ inference with libtorch. i was tested libtorch 1.9.0 and latest(Preview nightly) cpu version. but Torch & torchvision version is: PyTorch Version: 1.9.0+cu102 Torchvision Version: 0.10.0+cu102

Yasin40 avatar Aug 13 '21 14:08 Yasin40

@eellison who would be a good POC from the team to have a look?

fmassa avatar Aug 13 '21 14:08 fmassa

Also, @Yasin40 have you tried re-exporting the model in Python and running it again in C++?

fmassa avatar Aug 13 '21 14:08 fmassa

Also, @Yasin40 have you tried re-exporting the model in Python and running it again in C++?

Yes. Also i tested in this way:

import sys
import time
from pathlib import Path

import torch
import torch.nn as nn
from model import initialize_model,BSConv_init
num_classes=14
device = torch.device('cpu')
model = models.mobilenet_v3_small(pretrained=False)
model = model.to(device)
checkpoint = torch.load('mobilenet_v3_small-047dcff4.pth', map_location=device)  # load FP32 model
model.load_state_dict(checkpoint)
img = torch.rand(1, 3, 224, 224).to(device)
model.eval()
ts = torch.jit.trace(model, img, strict=False)
ts.save("traced_mobnet__pt_model.pt")

But got same error.

Yasin40 avatar Aug 13 '21 15:08 Yasin40

frontend related error cc @gmagogsfm

eellison avatar Aug 16 '21 18:08 eellison

anyone can help me? I tested resnet18 from torchvision model instead mobilenetv3 and works. Whats problem about mobilenetv3 and Linear?

Yasin40 avatar Aug 18 '21 08:08 Yasin40

@Yasin40 This looks like an error on PyTorch's nn module. The reason why MobileNetV3 is only affected is because it uses the Identity module which seems to have issue.

It might be worth filing this on PyTorch core.

datumbox avatar Aug 18 '21 09:08 datumbox

Anyone can help me?

Yasin40 avatar Aug 25 '21 10:08 Yasin40

@gmagogsfm can you have a look? Seems like an issue in the interpreter

fmassa avatar Aug 26 '21 12:08 fmassa

I tried to write an example with Identity module at top of trunk PyTorch, the issue is not reproducible.

Could you try loading a toy module generated by this script in your environment?

import torch

class JitModule(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.id = torch.nn.Identity()

    def forward(self, x: torch.Tensor):
        return self.id(x)

m = torch.jit.script(JitModule())
torch.jit.save(m, "identity_saved_module.pt")

gmagogsfm avatar Aug 27 '21 01:08 gmagogsfm

This is the serialized code when compiled in my environment:

  def forward(self: __torch__.torch.nn.modules.linear.Identity, input: Tensor) -> Tensor:

As you can see it is different from the one shown in your example:

def forward(self: __torch__.torch.nn.modules.linear.Identity) -> NoneType:

The forward signature from your example seems wrong as it doesn't take any tensor input, but Identity requires a tensor input: https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/linear.py#L35

gmagogsfm avatar Aug 27 '21 01:08 gmagogsfm

I tried to write an example with Identity module at top of trunk PyTorch, the issue is not reproducible.

Could you try loading a toy module generated by this script in your environment?

import torch

class JitModule(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.id = torch.nn.Identity()

    def forward(self, x: torch.Tensor):
        return self.id(x)

m = torch.jit.script(JitModule())
torch.jit.save(m, "identity_saved_module.pt")

Tanks, Yes. loaded successfully, but my problem is about using torchvision.models and its mobilenetv3. How can fix it?

Yasin40 avatar Aug 29 '21 15:08 Yasin40

@Yasin40 instead of using torch.jit.trace, can you try using instead torch.jit.script on your model?

fmassa avatar Sep 01 '21 12:09 fmassa

I tried to write an example with Identity module at top of trunk PyTorch, the issue is not reproducible. Could you try loading a toy module generated by this script in your environment?

import torch

class JitModule(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.id = torch.nn.Identity()

    def forward(self, x: torch.Tensor):
        return self.id(x)

m = torch.jit.script(JitModule())
torch.jit.save(m, "identity_saved_module.pt")

Tanks, Yes. loaded successfully, but my problem is about using torchvision.models and its mobilenetv3. How can fix it?

I will try using mobilenetv3 directly to see if I can reproduce.

gmagogsfm avatar Sep 01 '21 15:09 gmagogsfm

@Yasin40 instead of using torch.jit.trace, can you try using instead torch.jit.script on your model?

Yes, I test torch.jit.script and not affected.

Yasin40 avatar Sep 08 '21 06:09 Yasin40

@Yasin40 so you mean that torch.jit.script works successfully? If that's the case, then I believe we can close this issue, as torch.jit.script effectively replaces torch.jit.trace whenever the model can be scripted

fmassa avatar Sep 08 '21 07:09 fmassa

@Yasin40 so you mean that torch.jit.script works successfully? If that's the case, then I believe we can close this issue, as torch.jit.script effectively replaces torch.jit.trace whenever the model can be scripted

No, Doesn't work.

Yasin40 avatar Sep 08 '21 08:09 Yasin40

Maybe the reason is incompatible on torch version.I also encountered this problem because I train the model with torch 1.9.1,and deployment service with torch 1.8.0. But I solved the problem when I changed the torch to 1.10.0.---2021.11.11

lizhi1215 avatar Nov 11 '21 09:11 lizhi1215

@Yasin40 so you mean that torch.jit.script works successfully? If that's the case, then I believe we can close this issue, as torch.jit.script effectively replaces torch.jit.trace whenever the model can be scripted

No, Doesn't work.

Hi, Do you have any advancement on this issue? I faced the same problem but with the detection models in torchvision. torch.jit.trace produces an error when exporting due to output format. torch.jit.script exports the model successfully but the c++ lib fails to load it.

DuyHuynhLe avatar Feb 15 '22 14:02 DuyHuynhLe

Have you solved this problem? I also exported successfully with torch.jit.script but the c++ lib fails to load it.

terminate called after throwing an instance of 'torch::jit::ErrorReport'
  what():  
Unknown type name 'NoneType':
Serialized   File "code/__torch__/torch/nn/modules/container.py", line 5
  __buffers__ = []
  training : bool
  _is_full_backward_hook : NoneType
                           ~~~~~~~~ <--- HERE
  __annotations__["VEHICLE/node_history_encoder"] = __torch__.torch.nn.modules.rnn.LSTM
  __annotations__["VEHICLE/node_future_encoder"] = __torch__.torch.nn.modules.rnn.___torch_mangle_0.LSTM

zhaowenyi7 avatar Jul 04 '22 08:07 zhaowenyi7

Maybe the reason is incompatible on torch version.I also encountered this problem because I train the model with torch 1.9.1,and deployment service with torch 1.8.0. But I solved the problem when I changed the torch to 1.10.0.---2021.11.11

@lizhi1215 Could you please tell me 1.10.0 is the version of deployment service or trained model?

Eliza-and-black avatar Apr 11 '23 06:04 Eliza-and-black