djl icon indicating copy to clipboard operation
djl copied to clipboard

org.tensorflow.exceptions.TFInvalidArgumentException: Incompatible shapes: [1,14,14,128] vs. [1,20,20,128]

Open danieltog opened this issue 2 years ago • 3 comments

Description

Greetings, I'm using a custom-made model with DJL for object detection but it's giving me an error: Incompatible shapes: [1,14,14,128] vs. [1,20,20,128] (For MOBILENET_V2_I320) & Incompatible shapes: [1,14,14,128] vs. [1,16,16,128] (For MOBILENET_V2)

Expected Behavior

It should return a JSON with the expected coordinates of the objects in the analyzed image

Error Message

org.tensorflow.exceptions.TFInvalidArgumentException: Incompatible shapes: [1,14,14,128] vs. [1,20,20,128] [[{{function_node __inference__wrapped_model_117326}}{{node object_detector_model/retina_net_model/fpn/add/add}}]]] with root cause

How to Reproduce?

Use the following controller method in Java to reproduce the error:

@GetMapping("/predict_custom_object")
public static String predictCustomObject() throws IOException, ModelException, TranslateException {
    Path imageFile = Paths.get("input/object_recognition/beatles.jpeg");
    Image img = ImageFactory.getInstance().fromFile(imageFile);

    String modelUrl = "http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_mobilenet_v2_320x320_coco17_tpu-8.tar.gz";

    Criteria<Image, DetectedObjects> criteria =
            Criteria.builder()
                    .optApplication(Application.CV.OBJECT_DETECTION)
                    .setTypes(Image.class, DetectedObjects.class)
                    .optModelUrls(modelUrl)
                    // saved_model.pb file is in the subfolder of the model archive file
                    .optModelName("saved_model")
                    .optTranslator(new MyTranslator())
                    .optEngine("TensorFlow")
                    .optProgress(new ProgressBar())
                    .build();

    try (ZooModel<Image, DetectedObjects> model = criteria.loadModel();
         Predictor<Image, DetectedObjects> predictor = model.newPredictor()) {
        DetectedObjects detection = predictor.predict(img);
        examplesService.saveBoundingBoxImage(img, detection);
        return detection.toString();
    }
}

public static final class MyTranslator
        implements NoBatchifyTranslator<Image, DetectedObjects> {

    private Map<Integer, String> classes;
    private int maxBoxes;
    private float threshold;

    MyTranslator() {
        maxBoxes = 10;
        threshold = 0.7f;
    }

    /** {@inheritDoc} */
    @Override
    public NDList processInput(TranslatorContext ctx, Image input) {
        // input to tf object-detection models is a list of tensors, hence NDList
        NDArray array = input.toNDArray(ctx.getNDManager(), Image.Flag.COLOR);
        // optionally resize the image for faster processing
        array = NDImageUtils.resize(array, 224);
        // tf object-detection models expect 8 bit unsigned integer tensor
        array = array.toType(DataType.FLOAT32, true);
        array = array.expandDims(0); // tf object-detection models expect a 4 dimensional input
        return new NDList(array);
    }

    /** {@inheritDoc} */
    @Override
    public void prepare(TranslatorContext ctx) throws IOException {
        if (classes == null) {
            classes = examplesService.loadSynset();
        }
    }

    /** {@inheritDoc} */
    @Override
    public DetectedObjects processOutput(TranslatorContext ctx, NDList list) {
        // output of tf object-detection models is a list of tensors, hence NDList in djl
        // output NDArray order in the list are not guaranteed

        int[] classIds = null;
        float[] probabilities = null;
        NDArray boundingBoxes = null;
        for (NDArray array : list) {
            if ("detection_boxes".equals(array.getName())) {
                boundingBoxes = array.get(0);
            } else if ("detection_scores".equals(array.getName())) {
                probabilities = array.get(0).toFloatArray();
            } else if ("detection_classes".equals(array.getName())) {
                // class id is between 1 - number of classes
                classIds = array.get(0).toType(DataType.INT32, true).toIntArray();
            }
        }
        Objects.requireNonNull(classIds);
        Objects.requireNonNull(probabilities);
        Objects.requireNonNull(boundingBoxes);

        List<String> retNames = new ArrayList<>();
        List<Double> retProbs = new ArrayList<>();
        List<BoundingBox> retBB = new ArrayList<>();

        // result are already sorted
        for (int i = 0; i < Math.min(classIds.length, maxBoxes); ++i) {
            int classId = classIds[i];
            double probability = probabilities[i];
            // classId starts from 1, -1 means background
            if (classId > 0 && probability > threshold) {
                String className = classes.getOrDefault(classId, "#" + classId);
                float[] box = boundingBoxes.get(i).toFloatArray();
                float yMin = box[0];
                float xMin = box[1];
                float yMax = box[2];
                float xMax = box[3];
                Rectangle rect = new Rectangle(xMin, yMin, xMax - xMin, yMax - yMin);
                retNames.add(className);
                retProbs.add(probability);
                retBB.add(rect);
            }
        }

        return new DetectedObjects(retNames, retProbs, retBB);
    }
}

Steps to reproduce

I used POSTMAN to call the REST endpoint

What have you tried to solve it?

I've trained models in mediapipe_model_maker both in MOBILENET_V2_I320 and MOBILENET_V2

Environment Info

I'm using a Spring Boot project to test the DJL on a Macbook

danieltog avatar Oct 18 '23 08:10 danieltog

@danieltog

You need to make sure the NDArray shape matches what the model expected:

  1. the model expect uint8 data type
  2. the image size should be 320x320

You can read this doc for detail: https://docs.djl.ai/master/docs/tensorflow/how_to_import_tensorflow_models_in_DJL.html#tips-and-tricks-when-writing-translator-for-tensorflow-models

Here is the code that works for me:

    public static final class MyTranslator implements NoBatchifyTranslator<Image, DetectedObjects> {

        private List<String> classes;
        private int maxBoxes;
        private float threshold;

        MyTranslator() {
            maxBoxes = 10;
            threshold = 0.7f;
        }

        /**
         * {@inheritDoc}
         */
        @Override
        public NDList processInput(TranslatorContext ctx, Image input) {
            // input to tf object-detection models is a list of tensors, hence NDList
            NDArray array = input.toNDArray(ctx.getNDManager(), Image.Flag.COLOR);
            Transform transform = new Resize(320);
            array = transform.transform(array);
            // tf object-detection models expect 8 bit unsigned integer tensor
            array = array.toType(DataType.UINT8, true);
            array = array.expandDims(0); // tf object-detection models expect a 4 dimensional input
            return new NDList(array);
        }

        /**
         * {@inheritDoc}
         */
        @Override
        public void prepare(TranslatorContext ctx) throws IOException {
            if (classes == null) {
                Path path = Paths.get("classes.txt");
                classes = Utils.readLines(path);
            }
        }

        /**
         * {@inheritDoc}
         */
        @Override
        public DetectedObjects processOutput(TranslatorContext ctx, NDList list) {
            // output of tf object-detection models is a list of tensors, hence NDList in djl
            // output NDArray order in the list are not guaranteed

            int[] classIds = null;
            float[] probabilities = null;
            NDArray boundingBoxes = null;
            for (NDArray array : list) {
                if ("detection_boxes".equals(array.getName())) {
                    boundingBoxes = array.get(0);
                } else if ("detection_scores".equals(array.getName())) {
                    probabilities = array.get(0).toFloatArray();
                } else if ("detection_classes".equals(array.getName())) {
                    // class id is between 1 - number of classes
                    classIds = array.get(0).toType(DataType.INT32, true).toIntArray();
                }
            }
            Objects.requireNonNull(classIds);
            Objects.requireNonNull(probabilities);
            Objects.requireNonNull(boundingBoxes);

            List<String> retNames = new ArrayList<>();
            List<Double> retProbs = new ArrayList<>();
            List<BoundingBox> retBB = new ArrayList<>();

            // result are already sorted
            for (int i = 0; i < Math.min(classIds.length, maxBoxes); ++i) {
                int classId = classIds[i];
                double probability = probabilities[i];
                // classId starts from 1, -1 means background
                if (classId > 0 && probability > threshold) {
                    String className = classes.get(classId);
                    float[] box = boundingBoxes.get(i).toFloatArray();
                    float yMin = box[0];
                    float xMin = box[1];
                    float yMax = box[2];
                    float xMax = box[3];
                    Rectangle rect = new Rectangle(xMin, yMin, xMax - xMin, yMax - yMin);
                    retNames.add(className);
                    retProbs.add(probability);
                    retBB.add(rect);
                }
            }

            return new DetectedObjects(retNames, retProbs, retBB);
        }
    }

frankfliu avatar Oct 19 '23 22:10 frankfliu

@frankfliu

I made a change to the image size (320x320), and it worked fine. However, for the custom model I trained, I had to maintain the data type as Float; otherwise, it would throw an error.

I trained a custom model for recognizing fruits using a Colab notebook. But when I transfer the "saved_model.pb" file and the "variables" folder to my local Java environment and run the model, it keeps giving me results for bikes and cars, even when I input images of fruits. The output picture is actually the fruit image I used as input, but it doesn't provide any bounding boxes.

I'm wondering if I'm saving the "saved_model.pb" incorrectly, or why the model is making predictions for cars and bikes when it was trained specifically to recognize fruits.

danieltog avatar Oct 20 '23 02:10 danieltog

You need create your own classes.txt that contains your fruit names.

I changed your code, you might need change it back to your original one:

        public void prepare(TranslatorContext ctx) throws IOException {
            if (classes == null) {
                Path path = Paths.get("classes.txt");
                classes = Utils.readLines(path);
            }
        }

frankfliu avatar Oct 20 '23 05:10 frankfliu