C++ Inferencing using Torchscript Exported Torchvision model Erorr

Open Yasin40 opened this issue 2 years ago • 21 comments

🐛** C++ Inferencing using Torchscript Exported Torchvision model Erorr

I'm trying to use this approach to make my model (Mobilenetv3 small) using Torchvison models, In train and validation phase (python) worked Whiteout any problem but after saving Torchscript to use in c++ inference, got this error:

terminate` called after throwing an instance of 'torch::jit::ErrorReport'
  what():  
Unknown type name 'NoneType':
Serialized   File "code/__torch__/torch/nn/modules/linear.py", line 6
  training : bool
  _is_full_backward_hook : Optional[bool]
  def forward(self: __torch__.torch.nn.modules.linear.Identity) -> NoneType:
                                                                   ~~~~~~~~ <--- HERE
    return None
class Linear(Module):

Aborted (core dumped)

My simplified Torchscript exporting code:

import sys
import time
from pathlib import Path

import torch
import torch.nn as nn
from model import initialize_model,BSConv_init
num_classes=14
device = torch.device('cpu')
model = models.mobilenet_v3_small(pretrained=use_pretrained)
num_ftrs = model.classifier[3].in_features
model.classifier[3] = nn.Linear(num_ftrs, num_classes)
model = model.to(device)
checkpoint = torch.load('checkpoint/best_model_MobBsconv_ckpt.t7', map_location=device)  
model.load_state_dict(checkpoint['model'])

# Input
img = torch.rand(1, 3, 224, 224).to(device)
model.eval()
ts = torch.jit.trace(model, img, strict=False)
ts.save("traced_mob_bsconv_model.pt")

this exporting script run successfully, but using c++ produce error. this is my simpilified C++ code that works for other models:

try{ this->module = torch::jit::load(ModelAddress); }catch (const c10::Error& e) { std::cerr << "error loading the model: " << e.what() << std::endl; std::exit(EXIT_FAILURE); } half_ = (device_ != torch::kCPU); this->module.to(device_); if (half_) { module.to(torch::kHalf); } torch::NoGradGuard no_grad; module.eval(); Even got error until this initializing, but my other exported models work fine at forward and ... . I'm confused and need help.

Environment

env 1: System which trained and export torchscript (by above code): OS: Ubuntu 18.04.5 LTS (x86_64) GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: 6.0.0-1ubuntu2 (tags/RELEASE_600/final) CMake version: version 3.10.2 Libc version: glibc-2.25

Python version: 3.6.9 (default, Jul 17 2020, 12:50:27) [GCC 8.4.0] (64-bit runtime) Python platform: Linux-5.4.0-48-generic-x86_64-with-Ubuntu-18.04-bionic Is CUDA available: True CUDA runtime version: Could not collect GPU models and configuration: GPU 0: GeForce GTX 1060 6GB Nvidia driver version: 450.66 cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5 HIP runtime version: N/A MIOpen runtime version: N/A

Versions of relevant libraries: [pip3] numpy==1.19.4 [pip3] torch==1.9.0 [pip3] torchvision==0.10.0

env 2: system which run c++ code and got error:

OS: Ubuntu 18.04.4 LTS (x86_64) GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: Could not collect CMake version: version 3.10.2 Libc version: glibc-2.15

Python version: 2.7.17 (default, Jul 20 2020, 15:37:01) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-5.4.0-42-generic-x86_64-with-Ubuntu-18.04-bionic Is CUDA available: N/A CUDA runtime version: Could not collect GPU models and configuration: Could not collect Nvidia driver version: Could not collect cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A

Versions of relevant libraries: [pip3] numpy==1.18.1

Additional context

Aug 12 '21 17:08 Yasin40

Hi,

Was the version of PyTorch and torchvision used to run on the second environment the same as in the machine used to export the model?

Aug 13 '21 12:08 fmassa

Hi,

Was the version of PyTorch and torchvision used to run on the second environment the same as in the machine used to export the model?

Thanks for attention. In second environment i use model to c++ inference with libtorch. i was tested libtorch 1.9.0 and latest(Preview nightly) cpu version. but Torch & torchvision version is: PyTorch Version: 1.9.0+cu102 Torchvision Version: 0.10.0+cu102

Aug 13 '21 14:08 Yasin40

@eellison who would be a good POC from the team to have a look?

Aug 13 '21 14:08 fmassa

Also, @Yasin40 have you tried re-exporting the model in Python and running it again in C++?

Aug 13 '21 14:08 fmassa

Also, @Yasin40 have you tried re-exporting the model in Python and running it again in C++?

Yes. Also i tested in this way:

import sys
import time
from pathlib import Path

import torch
import torch.nn as nn
from model import initialize_model,BSConv_init
num_classes=14
device = torch.device('cpu')
model = models.mobilenet_v3_small(pretrained=False)
model = model.to(device)
checkpoint = torch.load('mobilenet_v3_small-047dcff4.pth', map_location=device)  # load FP32 model
model.load_state_dict(checkpoint)
img = torch.rand(1, 3, 224, 224).to(device)
model.eval()
ts = torch.jit.trace(model, img, strict=False)
ts.save("traced_mobnet__pt_model.pt")

But got same error.

Aug 13 '21 15:08 Yasin40

frontend related error cc @gmagogsfm

Aug 16 '21 18:08 eellison

anyone can help me? I tested resnet18 from torchvision model instead mobilenetv3 and works. Whats problem about mobilenetv3 and Linear?

Aug 18 '21 08:08 Yasin40

@Yasin40 This looks like an error on PyTorch's nn module. The reason why MobileNetV3 is only affected is because it uses the Identity module which seems to have issue.

It might be worth filing this on PyTorch core.

Aug 18 '21 09:08 datumbox

Anyone can help me?

Aug 25 '21 10:08 Yasin40

@gmagogsfm can you have a look? Seems like an issue in the interpreter

Aug 26 '21 12:08 fmassa

I tried to write an example with Identity module at top of trunk PyTorch, the issue is not reproducible.

Could you try loading a toy module generated by this script in your environment?

import torch

class JitModule(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.id = torch.nn.Identity()

    def forward(self, x: torch.Tensor):
        return self.id(x)

m = torch.jit.script(JitModule())
torch.jit.save(m, "identity_saved_module.pt")

Aug 27 '21 01:08 gmagogsfm

This is the serialized code when compiled in my environment:

  def forward(self: __torch__.torch.nn.modules.linear.Identity, input: Tensor) -> Tensor:

As you can see it is different from the one shown in your example:

def forward(self: __torch__.torch.nn.modules.linear.Identity) -> NoneType:

The forward signature from your example seems wrong as it doesn't take any tensor input, but Identity requires a tensor input: https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/linear.py#L35

Aug 27 '21 01:08 gmagogsfm

I tried to write an example with Identity module at top of trunk PyTorch, the issue is not reproducible.

Could you try loading a toy module generated by this script in your environment?
import torch

class JitModule(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.id = torch.nn.Identity()

    def forward(self, x: torch.Tensor):
        return self.id(x)

m = torch.jit.script(JitModule())
torch.jit.save(m, "identity_saved_module.pt")

Tanks, Yes. loaded successfully, but my problem is about using torchvision.models and its mobilenetv3. How can fix it?

Aug 29 '21 15:08 Yasin40

@Yasin40 instead of using torch.jit.trace, can you try using instead torch.jit.script on your model?

Sep 01 '21 12:09 fmassa

I tried to write an example with Identity module at top of trunk PyTorch, the issue is not reproducible. Could you try loading a toy module generated by this script in your environment?
import torch

class JitModule(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.id = torch.nn.Identity()

    def forward(self, x: torch.Tensor):
        return self.id(x)

m = torch.jit.script(JitModule())
torch.jit.save(m, "identity_saved_module.pt")
Tanks, Yes. loaded successfully, but my problem is about using torchvision.models and its mobilenetv3. How can fix it?

I will try using mobilenetv3 directly to see if I can reproduce.

Sep 01 '21 15:09 gmagogsfm

@Yasin40 instead of using torch.jit.trace, can you try using instead torch.jit.script on your model?

Yes, I test torch.jit.script and not affected.

Sep 08 '21 06:09 Yasin40

@Yasin40 so you mean that torch.jit.script works successfully? If that's the case, then I believe we can close this issue, as torch.jit.script effectively replaces torch.jit.trace whenever the model can be scripted

Sep 08 '21 07:09 fmassa

@Yasin40 so you mean that torch.jit.script works successfully? If that's the case, then I believe we can close this issue, as torch.jit.script effectively replaces torch.jit.trace whenever the model can be scripted

No, Doesn't work.

Sep 08 '21 08:09 Yasin40

Maybe the reason is incompatible on torch version.I also encountered this problem because I train the model with torch 1.9.1,and deployment service with torch 1.8.0. But I solved the problem when I changed the torch to 1.10.0.---2021.11.11

Nov 11 '21 09:11 lizhi1215

@Yasin40 so you mean that torch.jit.script works successfully? If that's the case, then I believe we can close this issue, as torch.jit.script effectively replaces torch.jit.trace whenever the model can be scripted

No, Doesn't work.

Hi, Do you have any advancement on this issue? I faced the same problem but with the detection models in torchvision. torch.jit.trace produces an error when exporting due to output format. torch.jit.script exports the model successfully but the c++ lib fails to load it.

Feb 15 '22 14:02 DuyHuynhLe

Have you solved this problem? I also exported successfully with torch.jit.script but the c++ lib fails to load it.

terminate called after throwing an instance of 'torch::jit::ErrorReport'
  what():  
Unknown type name 'NoneType':
Serialized   File "code/__torch__/torch/nn/modules/container.py", line 5
  __buffers__ = []
  training : bool
  _is_full_backward_hook : NoneType
                           ~~~~~~~~ <--- HERE
  __annotations__["VEHICLE/node_history_encoder"] = __torch__.torch.nn.modules.rnn.LSTM
  __annotations__["VEHICLE/node_future_encoder"] = __torch__.torch.nn.modules.rnn.___torch_mangle_0.LSTM

Jul 04 '22 08:07 zhaowenyi7

Maybe the reason is incompatible on torch version.I also encountered this problem because I train the model with torch 1.9.1,and deployment service with torch 1.8.0. But I solved the problem when I changed the torch to 1.10.0.---2021.11.11

@lizhi1215 Could you please tell me 1.10.0 is the version of deployment service or trained model?

Apr 11 '23 06:04 Eliza-and-black

vision vision copied to clipboard