mmdeploy `mmdeploy.apis.pytorch2onnx.torch2onnx` with Call id: 0 failed. exit.[Bug]

Checklist

[X] I have searched related issues but cannot get the expected help.
[X] 2. I have read the FAQ documentation but cannot get the expected help.
[ ] 3. The bug has not been fixed in the latest version.

Describe the bug

I tried both absolute and relative paths, but I kept getting errors

Reproduction

python tools/deploy.py configs/mmseg/segmentation_tensorrt-fp16_dynamic-512x1024-2048x2048.py ../mmsegmentation/configs/A-myconfig/ocrnet_hr48_4xb2-160k_cityscapes-512x1024_road.py ../mmsegmentation/configs/A-myconfig/iter_40000_12.pth ./road/999_104003357.jpg --work-dir mmdeploy_models/mmseg/ort --device cuda --show --dump-info

I changed the classes num to 4 and rewrote the dataset type

Environment

10/09 10:57:37 - mmengine - INFO - 

10/09 10:57:37 - mmengine - INFO - **********Environmental information**********
10/09 10:57:38 - mmengine - INFO - sys.platform: linux
10/09 10:57:38 - mmengine - INFO - Python: 3.8.18 (default, Sep 11 2023, 13:40:15) [GCC 11.2.0]
10/09 10:57:38 - mmengine - INFO - CUDA available: True
10/09 10:57:38 - mmengine - INFO - numpy_random_seed: 2147483648
10/09 10:57:38 - mmengine - INFO - GPU 0: NVIDIA TITAN X (Pascal)
10/09 10:57:38 - mmengine - INFO - CUDA_HOME: /usr/local/cuda-10.2
10/09 10:57:38 - mmengine - INFO - NVCC: Cuda compilation tools, release 10.2, V10.2.8
10/09 10:57:38 - mmengine - INFO - GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
10/09 10:57:38 - mmengine - INFO - PyTorch: 1.12.1+cu102
10/09 10:57:38 - mmengine - INFO - PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 10.2
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70
  - CuDNN 7.6.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=7.6.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

10/09 10:57:38 - mmengine - INFO - TorchVision: 0.13.1+cu102
10/09 10:57:38 - mmengine - INFO - OpenCV: 4.8.0
10/09 10:57:38 - mmengine - INFO - MMEngine: 0.8.4
10/09 10:57:38 - mmengine - INFO - MMCV: 2.0.1
10/09 10:57:38 - mmengine - INFO - MMCV Compiler: GCC 7.3
10/09 10:57:38 - mmengine - INFO - MMCV CUDA Compiler: 10.2
10/09 10:57:38 - mmengine - INFO - MMDeploy: 1.3.0+59449cc
10/09 10:57:38 - mmengine - INFO - 

10/09 10:57:38 - mmengine - INFO - **********Backend information**********
10/09 10:57:38 - mmengine - INFO - tensorrt:	None
10/09 10:57:38 - mmengine - INFO - ONNXRuntime:	1.8.1
10/09 10:57:38 - mmengine - INFO - ONNXRuntime-gpu:	None
10/09 10:57:38 - mmengine - INFO - ONNXRuntime custom ops:	Available
10/09 10:57:38 - mmengine - INFO - pplnn:	None
10/09 10:57:38 - mmengine - INFO - ncnn:	None
10/09 10:57:38 - mmengine - INFO - snpe:	None
10/09 10:57:38 - mmengine - INFO - openvino:	None
10/09 10:57:38 - mmengine - INFO - torchscript:	1.12.1+cu102
10/09 10:57:38 - mmengine - INFO - torchscript custom ops:	NotAvailable
10/09 10:57:38 - mmengine - INFO - rknn-toolkit:	None
10/09 10:57:38 - mmengine - INFO - rknn-toolkit2:	None
10/09 10:57:38 - mmengine - INFO - ascend:	None
10/09 10:57:38 - mmengine - INFO - coreml:	None
10/09 10:57:38 - mmengine - INFO - tvm:	None
10/09 10:57:38 - mmengine - INFO - vacc:	None
10/09 10:57:38 - mmengine - INFO - 

10/09 10:57:38 - mmengine - INFO - **********Codebase information**********
10/09 10:57:38 - mmengine - INFO - mmdet:	3.1.0
10/09 10:57:38 - mmengine - INFO - mmseg:	1.0.0
10/09 10:57:38 - mmengine - INFO - mmpretrain:	None
10/09 10:57:38 - mmengine - INFO - mmocr:	None
10/09 10:57:38 - mmengine - INFO - mmagic:	None
10/09 10:57:38 - mmengine - INFO - mmdet3d:	None
10/09 10:57:38 - mmengine - INFO - mmpose:	1.1.0
10/09 10:57:38 - mmengine - INFO - mmrotate:	None
10/09 10:57:38 - mmengine - INFO - mmaction:	None
10/09 10:57:38 - mmengine - INFO - mmrazor:	None
10/09 10:57:38 - mmengine - INFO - mmyolo:	None

Error traceback

10/09 10:59:00 - mmengine - WARNING - Failed to search registry with scope "mmseg" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmseg" is a correct scope, or whether the registry is initialized.
10/09 10:59:00 - mmengine - WARNING - Failed to search registry with scope "mmseg" in the "mmseg_tasks" registry tree. As a workaround, the current "mmseg_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmseg" is a correct scope, or whether the registry is initialized.
10/09 10:59:01 - mmengine - INFO - Start pipeline mmdeploy.apis.pytorch2onnx.torch2onnx in subprocess
10/09 10:59:02 - mmengine - WARNING - Failed to search registry with scope "mmseg" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmseg" is a correct scope, or whether the registry is initialized.
10/09 10:59:02 - mmengine - WARNING - Failed to search registry with scope "mmseg" in the "mmseg_tasks" registry tree. As a workaround, the current "mmseg_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmseg" is a correct scope, or whether the registry is initialized.
/media/veily/other/winnie_xiong/segment/mmsegmentation/mmseg/models/builder.py:36: UserWarning: ``build_loss`` would be deprecated soon, please use ``mmseg.registry.MODELS.build()`` 
  warnings.warn('``build_loss`` would be deprecated soon, please use '
/media/veily/other/winnie_xiong/segment/mmsegmentation/mmseg/models/losses/cross_entropy_loss.py:235: UserWarning: Default ``avg_non_ignore`` is False, if you would like to ignore the certain label and average loss over non-ignore labels, which is the same with PyTorch official cross_entropy, set ``avg_non_ignore=True``.
  warnings.warn(
Loads checkpoint by local backend from path: /media/veily/other/winnie_xiong/segment/mmsegmentation/configs/A-myconfig/iter_40000_12.pth
*********************checkpoint /media/veily/other/winnie_xiong/segment/mmsegmentation/configs/A-myconfig/iter_40000_12.pth
Process Process-2:
Traceback (most recent call last):
  File "/media/veily/work/envs/openmmlab/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/media/veily/work/envs/openmmlab/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/media/veily/other/winnie_xiong/segment/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 107, in __call__
    ret = func(*args, **kwargs)
  File "/media/veily/other/winnie_xiong/segment/mmdeploy/mmdeploy/apis/pytorch2onnx.py", line 63, in torch2onnx
    torch_model = task_processor.build_pytorch_model(model_checkpoint)
  File "/media/veily/other/winnie_xiong/segment/mmdeploy/mmdeploy/codebase/base/task.py", line 123, in build_pytorch_model
    load_checkpoint(model, model_checkpoint, map_location=self.device)
  File "/media/veily/work/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/checkpoint.py", line 638, in load_checkpoint
    checkpoint = _load_checkpoint(filename, map_location, logger)
  File "/media/veily/work/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/checkpoint.py", line 550, in _load_checkpoint
    return CheckpointLoader.load_checkpoint(filename, map_location, logger)
  File "/media/veily/work/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/checkpoint.py", line 330, in load_checkpoint
    return checkpoint_loader(filename, map_location)
  File "/media/veily/work/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/checkpoint.py", line 348, in load_from_local
    checkpoint = torch.load(filename, map_location=map_location)
  File "/media/veily/work/envs/openmmlab/lib/python3.8/site-packages/torch/serialization.py", line 706, in load
    if _is_torchscript_zip(opened_zipfile):
  File "/media/veily/work/envs/openmmlab/lib/python3.8/site-packages/torch/serialization.py", line 1057, in _is_torchscript_zip
    return 'constants.pkl' in zip_file.get_all_records()
RuntimeError: [enforce fail at inline_container.cc:250] . file in archive is not in a subdirectory archive/: A-myconfig/
10/09 10:59:03 - mmengine - ERROR - /media/veily/other/winnie_xiong/segment/mmdeploy/mmdeploy/apis/core/pipeline_manager.py - pop_mp_output - 80 - `mmdeploy.apis.pytorch2onnx.torch2onnx` with Call id: 0 failed. exit.

Oct 09 '23 03:10 Winnie202

这个报错感觉还没到模型转换的步骤。

你看下你的model config(ocrnet_hr48_4xb2-160k_cityscapes-512x1024_road.py)，里面有没有load_from 之类的，然后检查下指定的本地路径是否存在对应的文件

Oct 12 '23 03:10 irexyc

这个报错感觉还没到模型转换的步骤。

你看下你的model config(ocrnet_hr48_4xb2-160k_cityscapes-512x1024_road.py)，里面有没有load_from 之类的，然后检查下指定的本地路径是否存在对应的文件

谢谢回复。我的config内容如下： base = ['./ocrnet_hr18.py','./cityscapes.py']

default_scope = 'mmseg'

crop_size = (512, 1024) data_preprocessor = dict(size=crop_size)

dataset_type = 'BadRoadDataset' data_root = '/media/veily/work/cong/mmsegmentation/data/data_train/roadcrack/'

norm_cfg = dict(type='SyncBN', requires_grad=True) model = dict( data_preprocessor=data_preprocessor, pretrained='open-mmlab://msra/hrnetv2_w48', backbone=dict( extra=dict( stage2=dict(num_channels=(48, 96)), stage3=dict(num_channels=(48, 96, 192)), stage4=dict(num_channels=(48, 96, 192, 384)))), decode_head=[ dict( type='FCNHead', in_channels=[48, 96, 192, 384], channels=sum([48, 96, 192, 384]), input_transform='resize_concat', in_index=(0, 1, 2, 3), kernel_size=1, num_convs=1, norm_cfg=norm_cfg, concat_input=False, dropout_ratio=-1, num_classes=4, align_corners=False, loss_decode=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4, # class_weight = [0.8554, 0.8066, 1.1529, 1.2011] )), dict( type='OCRHead', in_channels=[48, 96, 192, 384], channels=512, ocr_channels=256, input_transform='resize_concat', in_index=(0, 1, 2, 3), norm_cfg=norm_cfg, dropout_ratio=-1, num_classes=4, align_corners=False, loss_decode=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)) ])

train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations'), # dict(type='RGB2Gray'), dict( type='RandomResize', scale=(2048, 1024), ratio_range=(0.5, 2.0), keep_ratio=True), dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), dict(type='RandomFlip', prob=0.5), dict(type='PhotoMetricDistortion'), dict(type='PackSegInputs') ]

train_dataloader = dict( # batch_size=2, # num_workers=2, dataset=dict( type=dataset_type, data_root=data_root, data_prefix=dict( img_path='leftImg8bit/918_gong_img', seg_map_path='gtFine/918_gong_mask'))) val_dataloader = dict( dataset=dict( type=dataset_type, data_root=data_root, data_prefix=dict( img_path='leftImg8bit/918_gong_img', seg_map_path='gtFine/918_gong_mask'))) test_dataloader = val_dataloader 本地路径也有对应文件

Oct 16 '23 06:10 Winnie202

贴config的话，把整个config贴出来吧，你这里有继承 base = ['./ocrnet_hr18.py','./cityscapes.py']

你可以先用Config load，然后再dump一下。

Oct 16 '23 06:10 irexyc

贴config的话，把整个config贴出来吧，你这里有继承 base = ['./ocrnet_hr18.py','./cityscapes.py']

你可以先用Config load，然后再dump一下。

好的，以下是我完整的config crop_size = ( 512, 1024, ) data_preprocessor = dict( bgr_to_rgb=True, mean=[ 123.675, 116.28, 103.53, ], pad_val=0, seg_pad_val=255, size=( 512, 1024, ), std=[ 58.395, 57.12, 57.375, ], type='SegDataPreProcessor') data_root = '/media/veily/work/cong/mmsegmentation/data/data_train/roadcrack/' dataset_type = 'BadRoadDataset' default_hooks = dict( checkpoint=dict(by_epoch=False, interval=16000, type='CheckpointHook'), logger=dict(interval=50, log_metric_by_epoch=False, type='LoggerHook'), param_scheduler=dict(type='ParamSchedulerHook'), sampler_seed=dict(type='DistSamplerSeedHook'), timer=dict(type='IterTimerHook'), visualization=dict(type='SegVisualizationHook')) default_scope = 'mmseg' env_cfg = dict( cudnn_benchmark=True, dist_cfg=dict(backend='nccl'), mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0)) img_ratios = [ 0.5, 0.75, 1.0, 1.25, 1.5, 1.75, ] load_from = None log_level = 'INFO' log_processor = dict(by_epoch=False) model = dict( backbone=dict( extra=dict( stage1=dict( block='BOTTLENECK', num_blocks=(4, ), num_branches=1, num_channels=(64, ), num_modules=1), stage2=dict( block='BASIC', num_blocks=( 4, 4, ), num_branches=2, num_channels=( 18, 36, ), num_modules=1), stage3=dict( block='BASIC', num_blocks=( 4, 4, 4, ), num_branches=3, num_channels=( 18, 36, 72, ), num_modules=4), stage4=dict( block='BASIC', num_blocks=( 4, 4, 4, 4, ), num_branches=4, num_channels=( 18, 36, 72, 144, ), num_modules=3)), norm_cfg=dict(requires_grad=True, type='SyncBN'), norm_eval=False, type='HRNet'), data_preprocessor=dict( bgr_to_rgb=True, mean=[ 123.675, 116.28, 103.53, ], pad_val=0, seg_pad_val=255, size=( 512, 1024, ), std=[ 58.395, 57.12, 57.375, ], type='SegDataPreProcessor'), decode_head=[ dict( align_corners=False, channels=sum([18, 36, 72, 144]), concat_input=False, dropout_ratio=-1, in_channels=[ 18, 36, 72, 144 ], in_index=( 0, 1, 2, 3, ), input_transform='resize_concat', kernel_size=1, loss_decode=dict( loss_weight=0.4, type='CrossEntropyLoss', use_sigmoid=False), norm_cfg=dict(requires_grad=True, type='SyncBN'), num_classes=19, num_convs=1, type='FCNHead'), dict( align_corners=False, channels=512, dropout_ratio=-1, in_channels=[ 48, 96, 192, 384, ], in_index=( 0, 1, 2, 3, ), input_transform='resize_concat', loss_decode=dict( loss_weight=0.4, type='CrossEntropyLoss', use_sigmoid=False), norm_cfg=dict(requires_grad=True, type='SyncBN'), num_classes=19, ocr_channels=256, type='OCRHead'), ], num_stages=2, pretrained='open-mmlab://msra/hrnetv2_w18', test_cfg=dict(mode='whole'), train_cfg=dict(), type='CascadeEncoderDecoder') norm_cfg = dict(requires_grad=True, type='SyncBN') optim_wrapper = dict( clip_grad=None, optimizer=dict(lr=0.01, momentum=0.9, type='SGD', weight_decay=0.0005), type='OptimWrapper') optimizer = dict(lr=0.01, momentum=0.9, type='SGD', weight_decay=0.0005) param_scheduler = [ dict( begin=0, by_epoch=False, end=160000, eta_min=0.0001, power=0.9, type='PolyLR'), ] resume = False test_cfg = dict(type='TestLoop') test_dataloader = dict( batch_size=1, dataset=dict( data_prefix=dict( img_path='leftImg8bit/val', seg_map_path='gtFine/val'), data_root='data/cityscapes/', pipeline=[ dict(type='LoadImageFromFile'), dict(keep_ratio=True, scale=( 2048, 1024, ), type='Resize'), dict(type='LoadAnnotations'), dict(type='PackSegInputs'), ], type='CityscapesDataset'), num_workers=4, persistent_workers=True, sampler=dict(shuffle=False, type='DefaultSampler')) test_evaluator = dict( iou_metrics=[ 'mIoU', ], type='IoUMetric') test_pipeline = [ dict(type='LoadImageFromFile'), dict(keep_ratio=True, scale=( 2048, 1024, ), type='Resize'), dict(type='LoadAnnotations'), dict(type='PackSegInputs'), ] train_cfg = dict( max_iters=160000, type='IterBasedTrainLoop', val_interval=16000) train_dataloader = dict( batch_size=2, dataset=dict( data_prefix=dict( img_path='leftImg8bit/train', seg_map_path='gtFine/train'), data_root='data/cityscapes/', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations'), dict( keep_ratio=True, ratio_range=( 0.5, 2.0, ), scale=( 2048, 1024, ), type='RandomResize'), dict( cat_max_ratio=0.75, crop_size=( 512, 1024, ), type='RandomCrop'), dict(prob=0.5, type='RandomFlip'), dict(type='PhotoMetricDistortion'), dict(type='PackSegInputs'), ], type='CityscapesDataset'), num_workers=2, persistent_workers=True, sampler=dict(shuffle=True, type='InfiniteSampler')) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations'), dict( keep_ratio=True, ratio_range=( 0.5, 2.0, ), scale=( 2048, 1024, ), type='RandomResize'), dict(cat_max_ratio=0.75, crop_size=( 512, 1024, ), type='RandomCrop'), dict(prob=0.5, type='RandomFlip'), dict(type='PhotoMetricDistortion'), dict(type='PackSegInputs'), ] tta_model = dict(type='SegTTAModel') tta_pipeline = [ dict(backend_args=None, type='LoadImageFromFile'), dict( transforms=[ [ dict(keep_ratio=True, scale_factor=0.5, type='Resize'), dict(keep_ratio=True, scale_factor=0.75, type='Resize'), dict(keep_ratio=True, scale_factor=1.0, type='Resize'), dict(keep_ratio=True, scale_factor=1.25, type='Resize'), dict(keep_ratio=True, scale_factor=1.5, type='Resize'), dict(keep_ratio=True, scale_factor=1.75, type='Resize'), ], [ dict(direction='horizontal', prob=0.0, type='RandomFlip'), dict(direction='horizontal', prob=1.0, type='RandomFlip'), ], [ dict(type='LoadAnnotations'), ], [ dict(type='PackSegInputs'), ], ], type='TestTimeAug'), ] val_cfg = dict(type='ValLoop') val_dataloader = dict( batch_size=1, dataset=dict( data_prefix=dict( img_path='leftImg8bit/val', seg_map_path='gtFine/val'), data_root=data_root = '/media/veily/work/cong/mmsegmentation/data/data_train/roadcrack/', pipeline=[ dict(type='LoadImageFromFile'), dict(keep_ratio=True, scale=( 2048, 1024, ), type='Resize'), dict(type='LoadAnnotations'), dict(type='PackSegInputs'), ], type='CityscapesDataset'), num_workers=4, persistent_workers=True, sampler=dict(shuffle=False, type='DefaultSampler')) val_evaluator = dict( iou_metrics=[ 'mIoU', ], type='IoUMetric') vis_backends = [ dict(type='LocalVisBackend'), ] visualizer = dict( name='visualizer', type='SegLocalVisualizer', vis_backends=[ dict(type='LocalVisBackend'), ])

Oct 16 '23 08:10 Winnie202