tfjs icon indicating copy to clipboard operation
tfjs copied to clipboard

"Cannot infer the missing size in [-1, 0] when there are 0 elements" when using padding in custom converted TFJS model

Open FabioRomagnolo opened this issue 3 years ago • 4 comments

Hi! I'm using a custom model for Background Matting, actually tested and working on ONNX Runtime, ONNX Runtime Web and Tensorflow (0.11 inference seconds on GPU). In order to use it I convert the PyTorch custom model to ONNX, then to Tensorflow and then to Tensorflow JS, using the recommended libraries such as onnx-tensorflow and tensorflowjs_converter.

The problem is that this model does not work at all with TensorflowJS and I've found out that the following error: "Cannot infer the missing size in [-1, 0] when there are 0 elements" is caused by the padding function of PyTorch and I just can't understand the reason why.

The problematic line is something like this:

from torch.nn import functional as F
x = F.pad(x, (3, 3, 3, 3))

I've done several tests and I'm sure that the problem with TFJS is caused only by this padding operation. In fact removing that from the model it works but messing up all the dimensions clearly, so it's a function that I need absolutely in my model. I have also another simplified version with less accurate results not using the padding and it works on TFJS.

Is this actually a TFJS bug? Is there a workaround to make padding work? Any help is very appreciated.

FabioRomagnolo avatar Jul 05 '22 10:07 FabioRomagnolo

hi @FabioRomagnolo It could very possible that during the conversion of PyTorch => Onnx => TF => TFJS something might have lost in translation. Can you help to create a minimum model that includes only the pad function and share all the models artifacts from the conversion pipeline with us? thanks

pyu10055 avatar Jul 07 '22 17:07 pyu10055

Thanks for your response. I've converted this simple PyTorch model (don't care about the output names):

from torch import nn
from torch.nn import functional as F


class PadTest(nn.Module):
    """
    A simple model executing padding operation on two input images to test the conversion to TFJS.
    """

    def __init__(self, padding=None):
        super().__init__()
        if padding is None:
            padding = [3, 3, 3, 3]
        self.padding = padding

    def forward(self, src, bgr):
        padded_src = F.pad(src, self.padding)
        padded_bgr = F.pad(bgr, self.padding)
        return padded_src, padded_bgr

Converting into the ONNX simplified model by using onnx-simplifier, when I try to convert to Tensorflow it just crashes with error: "Invalid value in tensor used for shape: -3" inside a tf.slice() operation and this is bad already.

Anyway, ignoring the ONNX optimization and converting the raw ONNX model to TF and then to TFJS all seems to go well.

But even if the TF model works, the TFJS model just throws this error during execution: "Error: GatherV2: the index value 0 is not in [0, -1] " and this is the error which leads to the "Cannot infer the missing size in [-1, 0] when there are 0 elements" error in my original model.

This is the download link for the models. Clearly the simplified version of the TF model is not here because it crashes before completing the conversion as stated before.

FabioRomagnolo avatar Jul 08 '22 09:07 FabioRomagnolo

@FabioRomagnolo I took a look at the generated models, it is quite complex, seems ONNX turns constants into sparse representation, is it possible to disable that during conversion?

pyu10055 avatar Jul 08 '22 21:07 pyu10055

@FabioRomagnolo I took a look at the generated models, it is quite complex, seems ONNX turns constants into sparse representation, is it possible to disable that during conversion?

That's impossible, you can only change the opset* during conversion and wheter to do or not the constant folding, which does not relate to the problem described before.

*The compatible opset with my model is actually the version 12 because of Squeeze v13 operation not supported by ONNX -> Tensorflow converter

FabioRomagnolo avatar Jul 11 '22 08:07 FabioRomagnolo

@FabioRomagnolo Any luck finding a solution to this issue?

Pensarfeo avatar Jan 11 '23 19:01 Pensarfeo

@FabioRomagnolo Any luck finding a solution to this issue?

Actually no. I succeeded converting the native TensorFlow model to TensorFlow.js.

FabioRomagnolo avatar Jan 11 '23 19:01 FabioRomagnolo

Hi, @FabioRomagnolo

Thank you for opening this issue. Since this issue has been open for a long time, the code/debug information for this issue may not be relevant with the current state of the code base.

The TFJs team is constantly improving the framework by fixing bugs and adding new features. We suggest you try the latest TFJs version with the latest compatible hardware configuration which could potentially resolve the issue. If you are still facing the issue, please create a new GitHub issue with your latest findings, with all the debugging information which could help us investigate.

Please follow the release notes to stay up to date with the latest developments which are happening in the Tensorflow.js space.

Thank you for your support and cooperation.

gaikwadrahul8 avatar Sep 03 '23 22:09 gaikwadrahul8

@gaikwadrahul8 Just for posterity, I solved this issue by using: https://github.com/PINTO0309/onnx2tf

Pensarfeo avatar Sep 04 '23 06:09 Pensarfeo

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

github-actions[bot] avatar Sep 12 '23 01:09 github-actions[bot]

This issue was closed due to lack of activity after being marked stale for past 7 days.

github-actions[bot] avatar Sep 19 '23 01:09 github-actions[bot]

Are you satisfied with the resolution of your issue? Yes No

google-ml-butler[bot] avatar Sep 19 '23 01:09 google-ml-butler[bot]