tensorflow-onnx icon indicating copy to clipboard operation
tensorflow-onnx copied to clipboard

Shifted segmentation mask output when converting from keras to onnx models

Open kgossage opened this issue 1 year ago • 2 comments

Describe the bug I trained a U-net style segmentation model using the exact model generation code found here: (https://keras.io/examples/vision/oxford_pets_image_segmentation/).

The segmentation mask lines up properly with the input RGB image (640x640x3 input size with 2 output classes) when run using the Keras model, but is shifted 15x15 pixels when running the model converted to onnx (onnx coords are shifted smaller than keras coords by 15 pixels in each dimension). I've tried tf2onnx.convert and console and tf2onnx.convert.from_keras python commands to convert the model and both have the same output. I've tried opsets 12-18, but no difference. This Unet style model takes the image from 640x640 to 40x40 before scaling back up to 640x640. This is a factor of 16x16 and I feel the problem is one of the upscaling layers is erroneously shifting (or failing to shift more likely) at each step.

Urgency ASAP

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 18.04*): Ubuntu 22.04.4 LTS
  • TensorFlow Version: 2.15.1
  • Python version: 3.9.19
  • ONNX version (if applicable, e.g. 1.11*): 1.17.0
  • ONNXRuntime version (if applicable, e.g. 1.11*): tf2onnx=1.16.1/15c810

To Reproduce

Screenshots

Additional context

kgossage avatar Nov 14 '24 23:11 kgossage

Could you please share an end-to-end example of how you discovering this problem so we can repro it more efficiently?

fatcat-z avatar Nov 15 '24 10:11 fatcat-z

Unfortunately I can't share the model or code at this point. If you use the example training code in the link I provided it should work. Scaling the images to 640x640 should exactly reproduce it, but I've been trying other image sizes and the shift is still there. I tried taking one of the loop pairs (downsample/upsample) out of the model and the shift switched from 15 to 7 so there is absolutely something happening incrementally. I've tried replacing certain layer types with other layers types (Upsampling2D with Resizing, SeparableConv2D with Conv2D, etc) but these didn't solve it. I'll try to create a simple git repo based on the example code next week if I haven't figured it out yet, but I think all the code needed is already there in the keras link. I think if you just use the existing code, train the model to segment dogs, save the model as a .keras model, convert to onnx, then run an image through both models you will see a 15x15 shift between the 2 output segmentation masks.

Kirk

On Fri, Nov 15, 2024 at 2:24 AM Jay Zhang @.***> wrote:

Could you please share an end-to-end example of how you discovering this problem so we can repro it more efficiently?

— Reply to this email directly, view it on GitHub https://github.com/onnx/tensorflow-onnx/issues/2366#issuecomment-2478497586, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIEF2LWTMA2ULGWTMLNOBVT2AXDULAVCNFSM6AAAAABR2BOGNOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINZYGQ4TONJYGY . You are receiving this because you authored the thread.Message ID: @.***>

kgossage avatar Nov 15 '24 18:11 kgossage