keras-onnx icon indicating copy to clipboard operation
keras-onnx copied to clipboard

Problems Converting Keras-Contrib Capsule Layer to .Onnx Format

Open MattWard97 opened this issue 5 years ago • 6 comments

Hello,

I have keras-contrib installed as a PyPi package and I am using a capsule layer (linked the code below) in one of my models.

https://github.com/keras-team/keras-contrib/blob/master/keras_contrib/layers/capsule.py

Previously I was converting my model to a .h5 file. When I was converting my model to an .h5 file I had to add some custom objects code when I was loading the model (code shown below):

clf = keras.models.load_model('7nn_model.h5', custom_objects={'Capsule':Capsule, 'squash': squash})

In order to get the same inference time performance from my model as I did while the model was being trained. I think I am running into a similar issue with the .onnx format in that I am now getting different inference time performance (after loading my .onnx file) of my model compared to when it was originally trained.

I was hoping someone would be able to shed some light on if they think the discrepancy in training / inference time performance I am seeing is in fact due to something with the capsule layer not being properly saved in the .onnx format? Any suggestions on how to address this are appreciated.

MattWard97 avatar Aug 24 '20 03:08 MattWard97

It looks like you have two onnx models and they have different inference time? what is the difference between these two onnx models?

jiafatom avatar Aug 24 '20 04:08 jiafatom

It looks like you have two onnx models and they have different inference time? what is the difference between these two onnx models?

@jiafatom Sorry if it was unclear David, I do not have two .onnx models. Hopefully this clears things up. I have one file, in which I train a neural network (which contains a capsule layer from keras-contrib) -- and then once it is trained I save that neural network as a .onnx file. I then have a second file where I load the .onnx file I just saved and then evaluate the performance of the model for a second time (just for verification purposes). The issue I am having looks like this ... when the model is originally being trained it will get train_score = 95% validation score = 93%. On the other hand when I load the .onnx file which contains the trained model and try to evaluate it again, I am getting train_score = 60% validation score = 58% and test score = 57%.

When I was previously using .h5 instead of .onnx I was having this same type of issue with models that included a capsule layer. I resolved the issue by including the custom objects line of code I posted in my original post.

MattWard97 avatar Aug 24 '20 04:08 MattWard97

So you have a keras model with custom layer, and then convert using keras2onnx to get onnx model. And your inference result keras vs. onnx mismatch? Then seems like a converter issue. Can you pull latest keras2onnx master and onnxconverter-common master and see if the issue still exists? If it does, can you share a simple model which repro your problem?

jiafatom avatar Aug 24 '20 04:08 jiafatom

@jiafatom Hi David -- Sorry for the delay. I'm not an expert on git based version control systems but to try and do what you asked I deleted all my onnx related packages off my machine...then I made sure I upgraded pip, and then I re-downloaded all the pip packages that I needed. I am still having an issue.

I used the MNIST dataset to make a couple scripts that can recreate this issue. As long as you have the required packages installed you should be able to simply run the two scripts and encounter the same problem (if of course there is a problem, and I didn't just make a mistake). The issue is occurring when I use a model that also includes causal padding on 1D convolutional layers. Is causal padding supported by onnx? If the causal padding is removed then the issue goes away it seems.

The script I made uses the MNIST dataset but only digits 0,2 and 6 to make for a simpler classifier. Additionally the MNIST dataset is flattened and used as if it were some type of time series data.

CapsuleModelSaver.py https://drive.google.com/file/d/13ZbIJh0s7thmk43Iu1dupvpJjoxHHsdx/view?usp=sharing

CapsuleOnnxTester.py https://drive.google.com/file/d/1UFeFekpuly9MmVlvDOs5GXNMb1U8h21B/view?usp=sharing

When CapsuleModelSaver.py is run the training performance is 80% and the testing performance is 78%. When CapsuleOnnxTester.py is run the training performance is 33% and the testing performance is 33%.

Please let me know if you cannot use the google drive links do to permissions issues and I will fix asap. Thank you.

MattWard97 avatar Aug 25 '20 22:08 MattWard97

@jiafatom I just wanted to follow up. Were you able to reproduce the error I am experiencing? Thank you.

MattWard97 avatar Oct 29 '20 04:10 MattWard97

@MattWard97 currently I am working on some other projects, so I am afraid that I don't have bandwidth looking at this issue. Thanks for your understanding.

jiafatom avatar Oct 29 '20 04:10 jiafatom