training_extensions
training_extensions copied to clipboard
[MISC] Distill-DSM Model
Submitting training module for Distill DSM: A computationally efficient method for segmentation of medical imaging volumes.
Paper: MIDL 2021 Dataset used for this code repo: Medical decathalon
This is part of the project MIRIAD: Many Incarnations of Screening of Radiology for High Throughput Disease Screening via Multiple Instance Reinforcement Learning with Adversarial Deep Neural Networks, sponsored by INTEL TECHNOLOGY INDIA PVT. LTD.
Principal Investigators: Dr Debdoot Sheet (PI), Dr Nirmalya Ghosh (Co-PI) Department of Electrical Engineering Indian Institute of Technology Kharagpur
Dr Ramanathan Sethuraman (Co-PI) Intel Technology India Pvt. Ltd.
Conversion of the model from ONNX to OpenVINO IR fails with the following error.
E subprocess.CalledProcessError: Command 'mo --framework onnx --input_model model_weights/distill_dsm.onnx --input_shape "[2, 1, 128, 160, 160]" --log_level DEBUG' returned non-zero exit status 127.
What could be the possible reasons for this?
Conversion of the model from ONNX to OpenVINO IR fails with the following error.
E subprocess.CalledProcessError: Command 'mo --framework onnx --input_model model_weights/distill_dsm.onnx --input_shape "[2, 1, 128, 160, 160]" --log_level DEBUG' returned non-zero exit status 127.
What could be the possible reasons for this?
Are there any additional details available? Could you, please, provide the full log from the model conversion command?
You can try to convert the model in the terminal as a separate command (not as part of subprocess.call
).
Are there any additional details available? Could you, please, provide the full log from the model conversion command? You can try to convert the model in the terminal as a separate command (not as part of
subprocess.call
).
Here is the full log.
Model Optimizer arguments: Common parameters: - Path to the Input Model: /home/rakshith/bmi7/training_extensions/misc/pytorch_toolkit/distilldsm/model_weights/distill_dsm.onnx - Path for generated IR: /home/rakshith/bmi7/training_extensions/misc/pytorch_toolkit/distilldsm/. - IR output name: distill_dsm - Log level: ERROR - Batch: Not specified, inherited from the model - Input layers: Not specified, inherited from the model - Output layers: Not specified, inherited from the model - Input shapes: [2,1,128,160,160] - Mean values: Not specified - Scale values: Not specified - Scale factor: Not specified - Precision of IR: FP32 - Enable fusing: True - Enable grouped convolutions fusing: True - Move mean values to preprocess section: None - Reverse input channels: False ONNX specific parameters: - Inference Engine found in: /home/rakshith/bmi7/training_extensions/misc/pytorch_toolkit/distilldsm/venv/lib/python3.6/site-packages/openvino Inference Engine version: 2021.4.1-3926-14e67d86634-releases/2021/4 Model Optimizer version: 2021.4.1-3926-14e67d86634-releases/2021/4 [ ERROR ] Cannot infer shapes or values for node "Slice_49". [ ERROR ] Output shape: [256 0 80 80] of node "Slice_49" contains non-positive values [ ERROR ]
[ ERROR ] It can happen due to bug in custom shape infer function <function Slice.infer at 0x7f4814f3ff28>. [ ERROR ] Or because the node inputs have incorrect values/shapes. [ ERROR ] Or because input shapes are incorrect (embedded to the model or passed via --input_shape). [ ERROR ] Run Model Optimizer with --log_level=DEBUG for more information. [ ERROR ] Exception occurred during running replacer "REPLACEMENT_ID" (<class 'extensions.middle.PartialInfer.PartialInfer'>): Stopped shape/value propagation at "Slice_49" node. For more information please refer to Model Optimizer FAQ, question #38. (https://docs.openvinotoolkit.org/latest/openvino_docs_MO_DG_prepare_model_Model_Optimizer_FAQ.html?question=38#question-38)
The input shape is the same as the one I had used to train the .pth model and also to convert it into ONNX.
You can try two options:
- Check that ONNX model works correct on some small dataset. Sometimes it can be converted without errors, but cannot be inferred properly.
- Sorry, the second option is not this case, never mind.
Also, you can try to move return statement in the model forward
function to distinguish which concrete operation produces this error.
- Check that ONNX model works correct on some small dataset. Sometimes it can be converted without errors, but cannot be inferred properly.
You are correct. The model got converted without errors but is unable to infer.
Hi, @Rakshith2597! Can you please check if I try to reproduce model conversion correctly?
net = U_Net(1, 2, conv_type='conv_2d', tsm=True, learn=True)
net.eval()
dummy_inp = torch.randn([1, 1, 128, 160, 160])
torch.onnx.export(net, dummy_inp, "model.onnx", opset_version=11)
with U_Net
from https://github.com/Rakshith2597/training_extensions/blob/bmi7/misc/pytorch_toolkit/distilldsm/src/models/UNetDistillDSM.py.
I've found that there is a place with torch.split
which returns zero dimension split:
shift_tensor, main_tensor = tensor.split([split_size*2, C - 2 * split_size], dim=1)
tensor.shape
is [128, 32, 160, 160]
but self.split_size
is 16 so we have tensor.split([32, 0], dim=1)
and main_tensor
has zeros. Is that intentional?
If that's expected, please apply this patch to make model OpenVINO compatible:
@@ -107,7 +107,11 @@ class learnTSM(nn.Module):
shape = T, C, H, W = tensor.shape
split_size = self.split_size
- shift_tensor, main_tensor = tensor.split([split_size*2, C - 2 * split_size], dim=1)
+ if split_size * 2 == tensor.shape[1]:
+ shift_tensor, main_tensor = tensor, None
+ else:
+ shift_tensor, main_tensor = tensor.split([split_size*2, C - 2 * split_size], dim=1)
+
# pre_tensor, post_tensor = shift_tensor.split([split_size, split_size], dim=1)
pre_tensor = shift_tensor
post_tensor = shift_tensor
@@ -115,7 +119,8 @@ class learnTSM(nn.Module):
main_conv_tensor = self.main_conv(shift_tensor).view(T//tsm_length, tsm_length, split_size, H, W)
pre_tensor = self.pre_conv(pre_tensor).view(T//tsm_length, tsm_length, split_size//2, H, W)
post_tensor = self.post_conv(post_tensor).view(T//tsm_length, tsm_length, split_size//2, H, W)
- main_tensor = main_tensor.view(T//tsm_length, tsm_length, C - 2*split_size, H, W)
+ if main_tensor is not None:
+ main_tensor = main_tensor.view(T//tsm_length, tsm_length, C - 2*split_size, H, W)
if self.version == 'zero':
pre_tensor = F.pad(pre_tensor, (0, 0, 0, 0, 0, 0, 1, 0))[:, :-1, ...] # NOQA
@@ -126,7 +131,10 @@ class learnTSM(nn.Module):
post_conv_tensor = torch.cat((post_conv_tensor[:, 1: , ...], # NOQA
post_conv_tensor[:, :1 , ...]), dim=1) # NOQA
# print(pre_tensor.shape, post_tensor.shape, main_conv_tensor.shape, main_tensor.shape, shape)
- return torch.cat((pre_tensor, post_tensor, main_conv_tensor, main_tensor), dim=2).view(shape)
+ if main_tensor is not None:
+ return torch.cat((pre_tensor, post_tensor, main_conv_tensor, main_tensor), dim=2).view(shape)
+ else:
+ return torch.cat((pre_tensor, post_tensor, main_conv_tensor), dim=2).view(shape)
Tested accuracy (with OpenVINO 2021.4):
net = U_Net(1, 2, conv_type='conv_2d', tsm=True, learn=True)
net.eval()
dummy_inp = torch.randn([1, 1, 128, 160, 160])
torch.onnx.export(net, dummy_inp, "model.onnx", opset_version=11,
input_names=["input"], output_names=["output"])
inp = torch.randn([1, 1, 128, 160, 160])
ref = net(inp)
from openvino.inference_engine import IECore
ie = IECore()
net = ie.load_network("model.onnx", "CPU")
out = net.infer({"input": inp})["output"]
print(ref.shape)
print(out.shape)
print(np.max(np.abs(ref.detach().numpy() - out)))
max diff: 6.7055225e-08
Can one of the admins verify this patch?
@goodsong81 can your team take a look at this?
Please resolve the merge conflicts then mark this PR as 'ready for review'.