training_extensions icon indicating copy to clipboard operation
training_extensions copied to clipboard

[MISC] Distill-DSM Model

Open Rakshith2597 opened this issue 3 years ago • 10 comments

Submitting training module for Distill DSM: A computationally efficient method for segmentation of medical imaging volumes.

Paper: MIDL 2021 Dataset used for this code repo: Medical decathalon

This is part of the project MIRIAD: Many Incarnations of Screening of Radiology for High Throughput Disease Screening via Multiple Instance Reinforcement Learning with Adversarial Deep Neural Networks, sponsored by INTEL TECHNOLOGY INDIA PVT. LTD.

Principal Investigators: Dr Debdoot Sheet (PI), Dr Nirmalya Ghosh (Co-PI) Department of Electrical Engineering Indian Institute of Technology Kharagpur

Dr Ramanathan Sethuraman (Co-PI) Intel Technology India Pvt. Ltd.

Rakshith2597 avatar Oct 10 '21 16:10 Rakshith2597

Conversion of the model from ONNX to OpenVINO IR fails with the following error.

E subprocess.CalledProcessError: Command 'mo --framework onnx --input_model model_weights/distill_dsm.onnx --input_shape "[2, 1, 128, 160, 160]" --log_level DEBUG' returned non-zero exit status 127.

What could be the possible reasons for this?

Rakshith2597 avatar Oct 10 '21 16:10 Rakshith2597

Conversion of the model from ONNX to OpenVINO IR fails with the following error.

E subprocess.CalledProcessError: Command 'mo --framework onnx --input_model model_weights/distill_dsm.onnx --input_shape "[2, 1, 128, 160, 160]" --log_level DEBUG' returned non-zero exit status 127.

What could be the possible reasons for this?

Are there any additional details available? Could you, please, provide the full log from the model conversion command? You can try to convert the model in the terminal as a separate command (not as part of subprocess.call).

morkovka1337 avatar Oct 11 '21 05:10 morkovka1337

Are there any additional details available? Could you, please, provide the full log from the model conversion command? You can try to convert the model in the terminal as a separate command (not as part of subprocess.call).

Here is the full log.

Model Optimizer arguments: Common parameters: - Path to the Input Model: /home/rakshith/bmi7/training_extensions/misc/pytorch_toolkit/distilldsm/model_weights/distill_dsm.onnx - Path for generated IR: /home/rakshith/bmi7/training_extensions/misc/pytorch_toolkit/distilldsm/. - IR output name: distill_dsm - Log level: ERROR - Batch: Not specified, inherited from the model - Input layers: Not specified, inherited from the model - Output layers: Not specified, inherited from the model - Input shapes: [2,1,128,160,160] - Mean values: Not specified - Scale values: Not specified - Scale factor: Not specified - Precision of IR: FP32 - Enable fusing: True - Enable grouped convolutions fusing: True - Move mean values to preprocess section: None - Reverse input channels: False ONNX specific parameters: - Inference Engine found in: /home/rakshith/bmi7/training_extensions/misc/pytorch_toolkit/distilldsm/venv/lib/python3.6/site-packages/openvino Inference Engine version: 2021.4.1-3926-14e67d86634-releases/2021/4 Model Optimizer version: 2021.4.1-3926-14e67d86634-releases/2021/4 [ ERROR ] Cannot infer shapes or values for node "Slice_49". [ ERROR ] Output shape: [256 0 80 80] of node "Slice_49" contains non-positive values [ ERROR ]
[ ERROR ] It can happen due to bug in custom shape infer function <function Slice.infer at 0x7f4814f3ff28>. [ ERROR ] Or because the node inputs have incorrect values/shapes. [ ERROR ] Or because input shapes are incorrect (embedded to the model or passed via --input_shape). [ ERROR ] Run Model Optimizer with --log_level=DEBUG for more information. [ ERROR ] Exception occurred during running replacer "REPLACEMENT_ID" (<class 'extensions.middle.PartialInfer.PartialInfer'>): Stopped shape/value propagation at "Slice_49" node. For more information please refer to Model Optimizer FAQ, question #38. (https://docs.openvinotoolkit.org/latest/openvino_docs_MO_DG_prepare_model_Model_Optimizer_FAQ.html?question=38#question-38)

The input shape is the same as the one I had used to train the .pth model and also to convert it into ONNX.

Rakshith2597 avatar Oct 12 '21 08:10 Rakshith2597

You can try two options:

  1. Check that ONNX model works correct on some small dataset. Sometimes it can be converted without errors, but cannot be inferred properly.
  2. Sorry, the second option is not this case, never mind.

morkovka1337 avatar Oct 12 '21 08:10 morkovka1337

Also, you can try to move return statement in the model forward function to distinguish which concrete operation produces this error.

morkovka1337 avatar Oct 12 '21 09:10 morkovka1337

  1. Check that ONNX model works correct on some small dataset. Sometimes it can be converted without errors, but cannot be inferred properly.

You are correct. The model got converted without errors but is unable to infer.

Rakshith2597 avatar Oct 14 '21 08:10 Rakshith2597

Hi, @Rakshith2597! Can you please check if I try to reproduce model conversion correctly?

net = U_Net(1, 2, conv_type='conv_2d', tsm=True, learn=True)
net.eval()
dummy_inp = torch.randn([1, 1, 128, 160, 160])
torch.onnx.export(net, dummy_inp, "model.onnx", opset_version=11)

with U_Net from https://github.com/Rakshith2597/training_extensions/blob/bmi7/misc/pytorch_toolkit/distilldsm/src/models/UNetDistillDSM.py.

I've found that there is a place with torch.split which returns zero dimension split:

shift_tensor, main_tensor = tensor.split([split_size*2, C - 2 * split_size], dim=1)

tensor.shape is [128, 32, 160, 160] but self.split_size is 16 so we have tensor.split([32, 0], dim=1) and main_tensor has zeros. Is that intentional?


If that's expected, please apply this patch to make model OpenVINO compatible:

@@ -107,7 +107,11 @@ class learnTSM(nn.Module):
         shape = T, C, H, W = tensor.shape
         split_size = self.split_size
 
-        shift_tensor, main_tensor = tensor.split([split_size*2, C - 2 * split_size], dim=1)
+        if split_size * 2 == tensor.shape[1]:
+            shift_tensor, main_tensor = tensor, None
+        else:
+            shift_tensor, main_tensor = tensor.split([split_size*2, C - 2 * split_size], dim=1)
+
         # pre_tensor, post_tensor = shift_tensor.split([split_size, split_size], dim=1)
         pre_tensor = shift_tensor
         post_tensor = shift_tensor
@@ -115,7 +119,8 @@ class learnTSM(nn.Module):
         main_conv_tensor = self.main_conv(shift_tensor).view(T//tsm_length, tsm_length, split_size, H, W)
         pre_tensor = self.pre_conv(pre_tensor).view(T//tsm_length, tsm_length, split_size//2, H, W)
         post_tensor = self.post_conv(post_tensor).view(T//tsm_length, tsm_length, split_size//2, H, W)
-        main_tensor = main_tensor.view(T//tsm_length, tsm_length, C - 2*split_size, H, W)
+        if main_tensor is not None:
+            main_tensor = main_tensor.view(T//tsm_length, tsm_length, C - 2*split_size, H, W)
 
         if self.version == 'zero':
             pre_tensor  = F.pad(pre_tensor,  (0, 0, 0, 0, 0, 0, 1, 0))[:,  :-1, ...]  # NOQA
@@ -126,7 +131,10 @@ class learnTSM(nn.Module):
             post_conv_tensor = torch.cat((post_conv_tensor[:,  1:  , ...],  # NOQA
                                      post_conv_tensor[:,   :1 , ...]), dim=1)  # NOQA
         # print(pre_tensor.shape, post_tensor.shape, main_conv_tensor.shape, main_tensor.shape, shape)
-        return torch.cat((pre_tensor, post_tensor, main_conv_tensor, main_tensor), dim=2).view(shape)
+        if main_tensor is not None:
+            return torch.cat((pre_tensor, post_tensor, main_conv_tensor, main_tensor), dim=2).view(shape)
+        else:
+            return torch.cat((pre_tensor, post_tensor, main_conv_tensor), dim=2).view(shape)

Tested accuracy (with OpenVINO 2021.4):

net = U_Net(1, 2, conv_type='conv_2d', tsm=True, learn=True)
net.eval()
dummy_inp = torch.randn([1, 1, 128, 160, 160])
torch.onnx.export(net, dummy_inp, "model.onnx", opset_version=11,
                  input_names=["input"], output_names=["output"])

inp = torch.randn([1, 1, 128, 160, 160])
ref = net(inp)

from openvino.inference_engine import IECore

ie = IECore()
net = ie.load_network("model.onnx", "CPU")
out = net.infer({"input": inp})["output"]
print(ref.shape)
print(out.shape)
print(np.max(np.abs(ref.detach().numpy() - out)))

max diff: 6.7055225e-08

dkurt avatar Jan 28 '22 07:01 dkurt

Can one of the admins verify this patch?

nervana-ff avatar Feb 15 '22 15:02 nervana-ff

@goodsong81 can your team take a look at this?

ryanloney avatar May 09 '22 18:05 ryanloney

Please resolve the merge conflicts then mark this PR as 'ready for review'.

goodsong81 avatar May 18 '22 07:05 goodsong81