SparseInst icon indicating copy to clipboard operation
SparseInst copied to clipboard

sparseinst onnx result issue

Open NEFUJoeyChen opened this issue 2 years ago • 11 comments

Hello, thank you for your nice work.

I trained sparseinst and use latest convert_onnx.py to converte onnx model successfully.

But, the result is different between pth and onnx.

(1)It seems the result of onnx only have two dimention: scores and masks. And the result of pth have three dimensions.

(2)I tried many images, but the scores of onnx is always low, and the masks is filtered to zero.

I have no idea if there are some problems in my inference code, If there are example scripts to inference image using onnx?

Thank you.

NEFUJoeyChen avatar Aug 12 '22 11:08 NEFUJoeyChen

same question here is there any example on how to parse onnx results?

VladMVLX avatar Aug 13 '22 20:08 VladMVLX

Trying to inference using ONNX converted from sparse_inst_r101_giam_7b62ea.pth by python convert_onnx.py --config-file configs/sparse_inst_r101_giam.yaml --width 640 --height 640 --opts MODEL.WEIGHTS sparse_inst_r101_giam_7b62ea.pth on RGB image with normalization Means = { 123.675f, 116.28f, 103.53f } and STD = { 58.395f, 57.12f, 57.375f }

dumping all 100 masks outputs with threshold of 0.45f , manually checking them comparing to pth model output and having non of any masks outputted by ONNX is even near to results given by original PTH model.

I tried to switch channels RGB to BGR, also tried to scale down original image maintaining aspect ratio, nothing helps to achieve results like PTH model outputs by python demo.py --config-file configs/sparse_inst_r101_giam.yaml --input colour_000000.jpg --output results --opts MODEL.WEIGHTS sparse_inst_r101_giam_7b62ea.pth on same image.

with scores I am having the same issue as @NEFUJoeyChen all scores are low and not representing the same values as original pth model does.

Any idea how should ideal inference on ONNX look like?

VladMVLX avatar Aug 14 '22 09:08 VladMVLX

Hi @NEFUJoeyChen and @VladMVLX, I'm fixing it now.

wondervictor avatar Aug 14 '22 12:08 wondervictor

Hello @wondervictor @NEFUJoeyChen @VladMVLX I meet the same question too, and look forward to your reply, thank you very much

UPwangcheng avatar Aug 15 '22 04:08 UPwangcheng

Update :

I could get as output the masks, scores and predicted classes by modifying forward_test. I have added those few lines at the end of the definition :

results = self.inference_test( output, images, max_shape, images.size()) processed_results = [{"instances": r} for r in results] predictions = processed_results[0] instances = predictions["instances"] instances = instances[instances.scores > 0.5] predictions["instances"] = instances return instances.scores, instances.pred_classes, instances.pred_masks The PNG image is the graph of my ONNX mdoel where you can see the three outputs. Waiting for the author to validate the method. onnx_model_class

As of converting it from ONNX to TensorRT, I got the same error as the model from PINTO/model_zoo github :

[TensorRT] INFO: No importer registered for op: NonZero. Attempting to import as plugin. [TensorRT] INFO: Searching for plugin: NonZero, plugin_version: 1, plugin_namespace: [TensorRT] ERROR: 3: getPluginCreator could not find plugin: NonZero version: 1 ERROR: Failed to parse the ONNX file

This NonZero function is called after the 'GreaterThan' function in ONNX: keep = scores > self.cls_threshold If you have any advice

GLHF

leandro-svg avatar Aug 18 '22 09:08 leandro-svg

Dear People,

I could solve the problems and parse the model to TensoRT. You need to remove the PPM_ONNX to have decent result from ONNX. Then you need to put out of the model the interpolate function Here is a result I could get from TensorRT : result_tensorrt

leandro-svg avatar Aug 23 '22 13:08 leandro-svg

Dear People,

I could solve the problems and parse the model to TensoRT. You need to remove the PPM_ONNX to have decent result from ONNX. Then you need to put out of the model the interpolate function Here is a result I could get from TensorRT : result_tensorrt

Hi, @leandro-svg , could you share a concrete instructions for your methods? I have met the same issue several days ago. Still cannot find the right way to get the correct result from ONNX or TensorRT engine. I thought that the network of onnx might not correct through using convert_onnx.py. You mentioned that take out PPM_ONNX can work, do you mean comment the onnx_ppm when using convert_onnx.py. and in the sparseinst.py, how to take out of the model the interpolate function? or I misunderstood your solution. Thanks a lot, if you can give a more detailed description. Or a PR to the author would be better for us to implement a correct ONNX model, then TRT engine. I am trying to follow your method.

LiZheng1997 avatar Sep 26 '22 04:09 LiZheng1997

Dear @LiZheng1997, before going into the explications, I have put my scripts in the following repository : https://github.com/leandro-svg/SparseInst_TensorRT . I have build everything on an edge device (Nvidia Jetson TX2) so it may not work at first on your computer but the idea is there.

In the ONNX converter file :

  1. I indeed removed the ppm module line 95, it wouldn't build with it and the results were good enough without it.
  2. I wanted to get the exactly identical output as the Pytorch model so I had to change the output names from ["scores, "masks"] to ["scores", "classes", "masks"] such that I could use the same post processing as with the Pytorch model.
  3. I changed the dummy input but I don't think it changed anything anyway.
  4. Line 107, the model.forward is changed to model.forward_test. In order to get the desired output, I also changed the forward_test definition in the sparseinst file. (See below)

In the SparseInst/sparseinst.py : Following the last point in the previous section, I had to use the postprocessing of the Pytorch model to get the right output and use the initial model.foward and inference definition but... In brief, the postprocess in the inference definition was using some function that weren't compatible with TensorRT or ONNX e.g. :

  • Line 167 : Interpolation function
  • Line 148 : keep = scores > self.cls_threshold (Not accepted with TensorRT)
  • Line 112 : Return a dictionnary which is not accepted in TensorRT and ONNX. So basically, and I have to admit it isn't optimal at all, I had to take out a big part of the postprocess outside of the model. For example, the upsampling (interpolation) is done outside of the model inference. In my github repository, you can go through my sparseinst.py code and check the forward_test_3 and inference_test_3 definitions see what changed and what has been removed. Then, if you go in the eval_tensorrt_onnx.py file, there is a definition called postprocess (Line 41) where the rest is done. For example, you can see the interpolation line ( nn.UpsamplingBilinear2d(size=(height, width))) or the dictionnary being created.

In conclusion :

  • Changed the convert_onnx.py to have the right output, removed ppm, changed the model.forward
  • Changed the sparseinst/sparseinst.py by removing interpolation, dictionnary output to tensor output, some other line non compatible with TensorRT (keep = scores > self.cls_threshold)
  • Changed the preprocess of the input to already have batched_input (See initial forward definition vs mine)
  • Changed the postprocess and added the line I had to remove in the model.forward like interpolation My codes are clearly not perfect but hope it helps and good luck, TensorRT is really capricious sometimes, i spend my time in the verbose output and netron.app ... 👍 😄

leandro-svg avatar Sep 26 '22 10:09 leandro-svg

Thank you so much!!! You save a lot of my time, I totally understand your solution, and am checking your codes now. BTW, I also use netron.app to review the output and the network, that is why the first time I saw your post, I knew your idea in general. TRT is not easy to learn at the early stage for me. You have done a great part work on this solution. +1 +1Thanks again.

Dear @LiZheng1997, before going into the explications, I have put my scripts in the following repository : https://github.com/leandro-svg/SparseInst_TensorRT . I have build everything on an edge device (Nvidia Jetson TX2) so it may not work at first on your computer but the idea is there.

In the ONNX converter file :

  1. I indeed removed the ppm module line 95, it wouldn't build with it and the results were good enough without it.
  2. I wanted to get the exactly identical output as the Pytorch model so I had to change the output names from ["scores, "masks"] to ["scores", "classes", "masks"] such that I could use the same post processing as with the Pytorch model.
  3. I changed the dummy input but I don't think it changed anything anyway.
  4. Line 107, the model.forward is changed to model.forward_test. In order to get the desired output, I also changed the forward_test definition in the sparseinst file. (See below)

In the SparseInst/sparseinst.py : Following the last point in the previous section, I had to use the postprocessing of the Pytorch model to get the right output and use the initial model.foward and inference definition but... In brief, the postprocess in the inference definition was using some function that weren't compatible with TensorRT or ONNX e.g. :

  • Line 167 : Interpolation function
  • Line 148 : keep = scores > self.cls_threshold (Not accepted with TensorRT)
  • Line 112 : Return a dictionnary which is not accepted in TensorRT and ONNX. So basically, and I have to admit it isn't optimal at all, I had to take out a big part of the postprocess outside of the model. For example, the upsampling (interpolation) is done outside of the model inference. In my github repository, you can go through my sparseinst.py code and check the forward_test_3 and inference_test_3 definitions see what changed and what has been removed. Then, if you go in the eval_tensorrt_onnx.py file, there is a definition called postprocess (Line 41) where the rest is done. For example, you can see the interpolation line ( nn.UpsamplingBilinear2d(size=(height, width))) or the dictionnary being created.

In conclusion :

  • Changed the convert_onnx.py to have the right output, removed ppm, changed the model.forward
  • Changed the sparseinst/sparseinst.py by removing interpolation, dictionnary output to tensor output, some other line non compatible with TensorRT (keep = scores > self.cls_threshold)
  • Changed the preprocess of the input to already have batched_input (See initial forward definition vs mine)
  • Changed the postprocess and added the line I had to remove in the model.forward like interpolation My codes are clearly not perfect but hope it helps and good luck, TensorRT is really capricious sometimes, i spend my time in the verbose output and netron.app ... +1 smile

Dear @leandro-svg Thank you so much!!! :+1: You save a lot of my time, I totally understand your solution, and am checking your codes now. BTW, I also use netron.app to review the output and the network, that is why the first time I saw your post, I knew your idea in general. TRT is not easy to learn at the early stage for me. You have done a great part work on this solution. :+1: :+1:Thanks again. If I got any updates, I will tell you.

LiZheng1997 avatar Sep 26 '22 16:09 LiZheng1997

@leandro-svg I used a docker image( NVIDIA CUDA 11.7 Update 1 Preview, Pytorch1.13.0a0+340c412, TensorRT 8.2.5) to get a TensorRT engine with a RTX 3070 Card, which can pass the error of "Clip" do not have max or min value, this error may cause by different ONNX version or TRT version, I am still not clear. But it solved the "Clip" error, and then, I use the eval_tensorrt_onnx.py in your repository to inference an image. It shows a error about "GenericMask cannot handle object's type", so I comment Line51 in post_process function in eval_tensorrt_onnx.py, this cause the error of data type cannot be handled. Also, the Line 274 in _test_engine function in eval_tensorrt_onnx.py has to be uncommented when you want to save the inference result using visualizer.py in detectron2. Finally, I got the result of an image, but the result is not good like the offical one, this might be caused by taking out of the PPM module in backbone.

In conclusion, I got a similar result like yours. in eval_tensorrt_onnx.py,

  • Line 51: Commented this line in post_process function
  • Line 274: Uncommented this line in _test_engine function

Thanks for your work. :+1: :100:

LiZheng1997 avatar Sep 28 '22 03:09 LiZheng1997

First of, Thanks to authors of SparseInst 😉

And I'll check the changes about those two lines as well as the Clip node problem. Thank you

leandro-svg avatar Sep 28 '22 05:09 leandro-svg