pytorch-CycleGAN-and-pix2pix exporting the pix2pix model to onnx model, but the inference result by onnx is incorrect

Recently, I need to export the pix2pix model to onnx in order to deploy that to other applications.

I ref. the transformation code from this post: https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix/issues/1113#issuecomment-728681741

Although I successfully convert the pix2pix model to onnx, I get the incorrect result by the onnx model compare to the pth model output in the same input.

I survey the paper, and I got some helpful info.:

Instead, for our final models, we provide noise only in the form of dropout, applied on several layers of our generator at both training and test time. Despite the dropout noise, we observe only minor stochasticity in the output of our nets.
At inference time, we run the generator net in exactly the same manner as during the training phase. This differs from the usual protocol in that we apply dropout at test time, and we apply batch normalization [26] using the statistics of the test batch, rather than aggregated statistics of the training batch.

That means we can't disable the dropout layer (and BN?) when (the inference time and) transferring the model from pth to onnx. So, I set the training argument training=TrainingMode.TRAINING in export API and model.training=True before calling the export API. Unfortunately, I got the same result after that. I have no idea why...

The following images are input data and result from the onnx model and pth model.

input data x	pth model output y (expected result)	onnx model output y (incorrect result)

Apr 25 '21 07:04 shingming

the same problem with CycleGAN using 'unet_256', any one can give me some help?

Jun 03 '21 02:06 mrljwlm

If you train with --norm instance the problem should go away. I noticed that if you run the test program after calling eval() on netG, the result will also be wrong. This is very strange because the dropout in batch norm is supposed to behave differently in train and inference, but pix2pix seems to want the exact same behavior in both training and inference. Since onnx is by definition for inference only, it is possible that batch norm, behaving as expected for inference, breaks everything. Anyway, using batch instance removes the problem.

Jul 13 '21 11:07 synthetica3d

@synthetica3d Thank you! After applying your suggestion, re-training, and then converting to an onnx file, I can get the expected result from onnx. So, can I summarize that appearing this condition because converting onnx API will change the batch normalization behaviors but not change instance normalization?

Jan 12 '22 16:01 shingming

are You try use faceenhancement model to onnx

May 25 '22 07:05 tongchangD

I got same problems and calling eval() is the reason.

Feb 13 '23 09:02 Bea07

pytorch-CycleGAN-and-pix2pix pytorch-CycleGAN-and-pix2pix copied to clipboard

exporting the pix2pix model to onnx model, but the inference result by onnx is incorrect

pytorch-CycleGAN-and-pix2pix
pytorch-CycleGAN-and-pix2pix copied to clipboard