djl icon indicating copy to clipboard operation
djl copied to clipboard

Memory leak for Yolov5 inferences

Open AndyDPesspd opened this issue 2 years ago • 26 comments

Description

I tried to deploy Yolov5 to a Ubuntu PC using DJL. However, a memory leak occurred during inference. After experimenting, I found that the memory leak was only in this code. results = this.predictor.predict(img);

I wonder if it's DJL's own problem? So I'm here to ask if there's a solution. Below I provide my code to check to avoid memory leaks due to my negligence.

Expected Behavior

After 10,000 inferences (CPU inferences), the memory does not change much.

Error Message

Memory leakage occurs after running for a certain period of time. For example, the memory usage increases sharply after 2500 inferences.

How to Reproduce?

Please check if there's a problem. Here is my Code:

public Yolov5() throws IOException, ModelNotFoundException, MalformedModelException {
        String nativeTempDir = System.getProperty("java.io.tmpdir");
        String resourceFileName = "/yolov5s";
        File directory = new File(nativeTempDir + File.separator + resourceFileName);

        logger.info("Start to build Translater.");
        Translator<Image, DetectedObjects> translator = YoloV5Translator.builder()
                .optSynsetArtifactName("result.names")
                .build();
        logger.info("Start to build Criteria.");
        Criteria<Image, DetectedObjects> criteria = Criteria.builder()
                .setTypes(Image.class, DetectedObjects.class)
                .optDevice(Device.cpu())
                .optModelUrls(String.valueOf(directory.toURI()))
                .optModelName("best.torchscript")
                .optTranslator(translator)
                .optEngine("PyTorch")
                .build();
        this.translator = translator;

        this.criteria = criteria;

        this.model = ModelZoo.loadModel(this.criteria);

        this.predictor = this.model.newPredictor();
    }

    private Mat old;

    private Mat frame = new Mat();

    private Image img;

    private DetectedObjects results;

    public DetectResult detect(String moduleNumber, String pointNumber, String imgPath, String saveDir) {

        logger.info("Start to preprocess the image.");

        File dir = new File(saveDir);
        if (!dir.exists() && !dir.isDirectory()) {
            dir.mkdirs();
        }

        String savePath = saveDir + File.separator + imgPath.substring(imgPath.lastIndexOf(File.separator) + 1);

        logger.info("Yolov5 inference begins.");


        try {
            old = Imgcodecs.imread(imgPath);
            Imgproc.resize(old, frame, new Size(640, 640));
            img = ImageFactory.getInstance().fromImage(HighGui.toBufferedImage(frame));

            results = this.predictor.predict(img);

            System.out.println(results);

            logger.info(String.format("Yolov5 inference finishes, the result: %s", results));
            }
        } catch (TranslateException e) {
            throw new RuntimeException(e);
        } finally {
            old.release();
            frame.release();
        }

Environment Info

<dependencies>
        <dependency>
            <groupId>ai.djl.pytorch</groupId>
            <artifactId>pytorch-native-cpu</artifactId>
            <classifier>win-x86_64</classifier>
            <scope>runtime</scope>
            <version>2.0.1</version>
        </dependency>
        <dependency>
            <groupId>ai.djl.pytorch</groupId>
            <artifactId>pytorch-jni</artifactId>
            <version>2.0.1-0.23.0</version>
            <scope>runtime</scope>
        </dependency>
        <dependency>
            <groupId>ai.djl</groupId>
            <artifactId>api</artifactId>
            <version>0.23.0</version>
        </dependency>
        <dependency>
            <groupId>ai.djl.pytorch</groupId>
            <artifactId>pytorch-model-zoo</artifactId>
            <version>0.23.0</version>
        </dependency>
        <dependency>
            <groupId>ai.djl.pytorch</groupId>
            <artifactId>pytorch-engine</artifactId>
            <version>0.23.0</version>
        </dependency>
        <dependency>
            <groupId>ai.djl.pytorch</groupId>
            <artifactId>pytorch-native-cpu-precxx11</artifactId>
            <classifier>linux-x86_64</classifier>
            <version>2.0.1</version>
            <scope>runtime</scope>
        </dependency>

        <dependency>
            <groupId>org.manufacture</groupId>
            <artifactId>opencv</artifactId>
            <version>4.5.2</version>
            <scope>system</scope>
            <systemPath>${project.basedir}/lib/opencv-452.jar</systemPath>
        </dependency>

        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-api</artifactId>
            <version>1.7.36</version>
        </dependency>


    </dependencies>

Ubuntu20.04 + JAVA 1.8 + Continuous yolov5 inference.

AndyDPesspd avatar Oct 08 '23 06:10 AndyDPesspd

@AndyDPesspd

The code looks fine. Can you show how you invoke detect() function? Are you creating Yolov5() object on each inference?

frankfliu avatar Oct 08 '23 12:10 frankfliu

I hold and use Yolov5 as a member variable, and I don't create it repeatedly. Here's a simple test code for me, but there's also a memory leak problem. That's why I think it's DJL's own problem.

public class jarTest {
    public static void main(String[] args) throws ModelNotFoundException, MalformedModelException, IOException {
        Yolov5 yolov5 = new Yolov5();
        for (int i = 0; i < 1000000; i++ ) {
            System.out.println(i);
            yolov5.detect("fff", "9", "D:\\jarTest\\s1_pack1_pt13_v0.jpg",
                    "D:\\");

            //yolov5.predictor.close();
           // yolov5.predictor = yolov5.model.newPredictor();
        }
    }
}

And I found that if I closed the predictor after each inference and then turned it back on, the memory leak was slower. If the model is disabled after each inference and then restarted, the memory leakage speed is greatly reduced. But i don't create Yolov5() object repeatedly.

AndyDPesspd avatar Oct 09 '23 01:10 AndyDPesspd

And I've tested several versions of pytorch-jni, all of which have memory leaks. 1.13.0-0.20.0 1.10.0-0.15.0 1.11.0-0.21.0 1.11.0-0.19.0 1.13.1-0.21.0 2.01.0-0.23.0

AndyDPesspd avatar Oct 09 '23 02:10 AndyDPesspd

I pulled all the variables out and fixed them, and I ran predict continuously (just run this line of code: results = this.predictor.predict(img); ), and there was still a memory leak, which made me confirm that it was djl's own problem.

AndyDPesspd avatar Oct 09 '23 03:10 AndyDPesspd

@frankfliu

AndyDPesspd avatar Oct 09 '23 07:10 AndyDPesspd

@AndyDPesspd

The code looks fine. Can you show how you invoke detect() function? Are you creating Yolov5() object on each inference?

@frankfliu Please check my reply above. Thank you so much. I think DJL will be great.

AndyDPesspd avatar Oct 09 '23 12:10 AndyDPesspd

I tested the following code, and I didn't see any memory leak:

public class Yolov5 {

    public static void main(String[] args) throws ModelException, IOException, TranslateException {
        String home = System.getProperty("user.home");
        Path imgPath = Paths.get(home, "sample/kitten.jpg");
        Yolov5 yolov5 = new Yolov5();
        for (int i = 0; i < 10000; i++) {
            System.out.println(i);
            yolov5.detect(imgPath);
        }
    }

    private Predictor<Image, DetectedObjects> predictor;

    public Yolov5() throws IOException, ModelException {
        Criteria<Image, DetectedObjects> criteria = Criteria.builder()
                .setTypes(Image.class, DetectedObjects.class)
                .optModelUrls("djl://ai.djl.pytorch/yolov5s")
                .optTranslatorFactory(new YoloV5TranslatorFactory())
                .optEngine("PyTorch")
                .build();
        ZooModel<Image, DetectedObjects> model = criteria.loadModel();
        predictor = model.newPredictor();
    }

    public void detect(Path imgPath) throws IOException, TranslateException {
        Image img = ImageFactory.getInstance().fromFile(imgPath);
        DetectedObjects results = predictor.predict(img);
    }
}

I set JVM options as -Xmx1g -Xms1g to limit the JVM heap, and the system memory consumption is stable at 1.48G - 1.51 G.

frankfliu avatar Oct 10 '23 04:10 frankfliu

@AndyDPesspd

If you wan to use openCV to process image, you can use opencv extension: https://github.com/deepjavalibrary/djl/tree/master/extensions/opencv

frankfliu avatar Oct 10 '23 04:10 frankfliu

First of all, thank you for reminding me of the possible problems with openCV. After I stopped using the openCV compiled by myself to read pictures and transfer them to DJL, the memory leak problem was solved. Amazingly, the memory leak problem is not solved immediately after the openCV is removed. In my experiment 5 hours ago, openCV was removed and it still reported several memory leaks. No memory leaks occurred until an hour ago. (If openCV was introduced, memory leaks would still occur), which puzzled me, but at least the problem was solved.

AndyDPesspd avatar Oct 11 '23 09:10 AndyDPesspd

@frankfliu Now, however, it brings new problems. The detection precision decreases and an empty array is returned. The change I made was to no longer use openCV to read images, that is, replace the code .

old = Imgcodecs.imread(imgPath);
Imgproc.resize(old, frame, new Size(640, 640));
img = ImageFactory.getInstance().fromImage(HighGui.toBufferedImage(frame));
results = this.predictor.predict(img);

replace the above code with the following:

old = ImageFactory.getInstance().fromFile(Paths.get(imgPath));
img = old.resize(640, 640, true);
results = this.predictor.predict(img);

If I use openCV to read images, the inference speed will be fast and the output will be more accurate, but the memory will leak. The output is as follows:

image

After openCV is removed, the memory leakage disappears, but the inference is slowed down for 0.5s. The output is as follows:

image

Obviously, the accuracy of the detection is reduced (0.67 vs 0.88). In addition, it contains a defect empty array as follows:

image

Can you answer my question, why? Why is there a drop in precision and an empty array output? Thank you very much. you've been a great help.

AndyDPesspd avatar Oct 11 '23 09:10 AndyDPesspd

ok,i already know the cause and find a temporary solution. It's better if you can solve the problem at the root, which comes from translator.optSynsetArtifactName(). best regards.

AndyDPesspd avatar Oct 11 '23 11:10 AndyDPesspd

@AndyDPesspd

I tried with DJL OpenCV extension. no memory leak as well. So the memory leak must related to your OpenCV code somewhere.

Can you elaborate a bit more about translator.optSynsetArtifactName()? What the problem are you facing? In our test we get the exact the same result no matter use OpenCV or not.

frankfliu avatar Oct 12 '23 06:10 frankfliu

@AndyDPesspd

I tried with DJL OpenCV extension. no memory leak as well. So the memory leak must related to your OpenCV code somewhere.

Can you elaborate a bit more about translator.optSynsetArtifactName()? What the problem are you facing? In our test we get the exact the same result no matter use OpenCV or not.

Yes, the OpenCV is compiled by myself. When i remove the OpenCV, no more memory leaks. And If.optSynsetArtifactName("result.names") is used, an empty array may be output, as shown in the preceding figure.

AndyDPesspd avatar Oct 12 '23 11:10 AndyDPesspd

Finally, I have one problem left unsolved. that is, the detection precision decreases because i replace the code.

old = Imgcodecs.imread(imgPath);
Imgproc.resize(old, frame, new Size(640, 640));
img = ImageFactory.getInstance().fromImage(HighGui.toBufferedImage(frame));
results = this.predictor.predict(img);

replace the above code with the following:

old = ImageFactory.getInstance().fromFile(Paths.get(imgPath));
img = old.resize(640, 640, true);
results = this.predictor.predict(img);

I think the drop in precision is because OpenCV reads images in BGR format, while ImageFactory.getInstance().fromFile reads images in RGB format. Is my guess correct? Do you have a solution? @frankfliu

AndyDPesspd avatar Oct 12 '23 12:10 AndyDPesspd

@AndyDPesspd If you use DJL's OpenCV extension, ImageFactory will use OpenCV, and it will handle BGR properly: https://github.com/deepjavalibrary/djl/blob/master/extensions/opencv/src/main/java/ai/djl/opencv/OpenCVImage.java#L138

When you use .optSynsetArtifactName("result.names"), DJL will read this file as the text file, and each line be treated as a classification. It looks like your "result.names" is binary file and lacking \n and line feed.

frankfliu avatar Oct 12 '23 14:10 frankfliu

image

You're right. If i add an OpenCV extension, the accuracy will return to normal. However, a memory leak will occur again. I've done a lot of experiments to prove this, as shown in the table above. The conclusion is, with OpenCV . (whether OpenCV compiled by myself or DJL OpenCV extension) Memory leakage occurs. If OpenCV is not used, the precision is reduced. I don't know whether you can come to that conclusion, but I've done a lot of experiments at this moment. So can you help me, is there a solution, neither memory leaks nor precision degradation? thank you very much @frankfliu

AndyDPesspd avatar Oct 13 '23 14:10 AndyDPesspd

@AndyDPesspd I tried with DJL OpenCV extension. no memory leak as well. So the memory leak must related to your OpenCV code somewhere. Can you elaborate a bit more about translator.optSynsetArtifactName()? What the problem are you facing? In our test we get the exact the same result no matter use OpenCV or not.

Yes, the OpenCV is compiled by myself. When i remove the OpenCV, no more memory leaks. And If.optSynsetArtifactName("result.names") is used, an empty array may be output, as shown in the preceding figure.

@frankfliu Can you show me your code?How do you use the OpenCV extension? I added the OpenCV extension to the pom. and only used these three lines of code.

old = ImageFactory.getInstance().fromFile(Paths.get(imgPath));
img = old.resize(640, 640, true);
results = this.predictor.predict(img);

No more action, but it does have a memory leak. Is my code missing a critical step? Can you show me yours?

AndyDPesspd avatar Oct 17 '23 06:10 AndyDPesspd

If you use built-in Yolov5 model from DJL model zoo, you don't need resize the image.

The code I posted earlier can use OpenCV extension. You only need add opencv extension in the pom.xml. DJL will pick it up automatically. You can check the Image type, it should be an instance of OpenCVImage.

frankfliu avatar Oct 17 '23 07:10 frankfliu

If you use built-in Yolov5 model from DJL model zoo, you don't need resize the image.

The code I posted earlier can use OpenCV extension. You only need add opencv extension in the pom.xml. DJL will pick it up automatically. You can check the Image type, it should be an instance of OpenCVImage.

I tried to enter the image directly, and the following error was reported.

RuntimeError: The size of tensor a (684) must match the size of tensor b (80) at non-singleton dimension 3

	at ai.djl.pytorch.jni.PyTorchLibrary.moduleRunMethod(Native Method)
	at ai.djl.pytorch.jni.IValueUtils.forward(IValueUtils.java:53)
	at ai.djl.pytorch.engine.PtSymbolBlock.forwardInternal(PtSymbolBlock.java:145)
	at ai.djl.nn.AbstractBaseBlock.forward(AbstractBaseBlock.java:79)
	at ai.djl.nn.Block.forward(Block.java:127)
	at ai.djl.inference.Predictor.predictInternal(Predictor.java:140)
	at ai.djl.inference.Predictor.batchPredict(Predictor.java:180)
	... 3 more

To check whether the resize function is normal, I put 640 * 640 pictures directly instead of resize. However, it is still a memory leak. I can confirm that the memory leak is caused by openCV. Can you give me some advice on how to skip openCV, such as how to handle the native BufferedImage and make it a BGR format? Or does ONNX have a Yolo V5 running template, is it worth trying? Thank you very much! @frankfliu

AndyDPesspd avatar Oct 17 '23 08:10 AndyDPesspd

The following code works for both w/o OpenCV, You don't need resize the image manually. Yolov5TranslatorFactory will pick up the arguments and do everything for you. And I don't see any accuracy problem either.

Change the engine to "OnnxRuntime" and point to onnx modle should just work as well. We also have built-in yolov5 in our onnxruntime model zoo.

    public static void main(String[] args) throws ModelException, IOException, TranslateException {
        String home = System.getProperty("user.home");
        Path imgPath = Paths.get(home, "sample/kitten.jpg");
        Yolov5 yolov5 = new Yolov5();
        for (int i = 0; i < 10000; i++) {
            System.out.println(i);
            yolov5.detect(imgPath);
        }
    }

    private Predictor<Image, DetectedObjects> predictor;

    public Yolov5() throws IOException, ModelException {
        String home = System.getProperty("user.home");
        Path path = Paths.get(home, ".djl.ai/cache/repo/model/cv/object_detection/ai/djl/pytorch/yolov5s/0.0.1");
        Criteria<Image, DetectedObjects> criteria = Criteria.builder()
                .setTypes(Image.class, DetectedObjects.class)
                .optModelPath(path)
                .optArgument("width", "640")
                .optArgument("height", "640")
                .optArgument("resize", "true")
                .optArgument("rescale", "true")
                .optArgument("optApplyRatio", "true")
                .optArgument("threshold", "0.4")
                .optTranslatorFactory(new YoloV5TranslatorFactory())
                .optEngine("PyTorch")
                .build();
        ZooModel<Image, DetectedObjects> model = criteria.loadModel();
        predictor = model.newPredictor();
    }

    public void detect(Path imgPath) throws IOException, TranslateException {
        Image img = ImageFactory.getInstance().fromFile(imgPath);
        DetectedObjects results = predictor.predict(img);
    }

frankfliu avatar Oct 17 '23 14:10 frankfliu

Thank you very much. You've been a great help. Thank you for developing DJL, a tool that allows us to deploy algorithms on many devices. @frankfliu According to your code above, there is no memory leak or precision degradation. I think that's a very good question to think about, my last version of code is concise and standard, where is the problem? I'll continue to follow up and hope to explore why.

And there's a small problem to be concerned about. Considering your last suggestion: “DJL will read this file as the text file, and each line be treated as a classification. It looks like your "result.names" is binary file and lacking \n and line feed.”

I provided the txt file to replace the binary file, as shown in the following figure. image

However, the problem of returning an empty array still occurs. Do you have any suggestions?

image

It means to have newline characters, but no extra lines? If there is a third line, a redundant return occurs.

image

AndyDPesspd avatar Oct 19 '23 02:10 AndyDPesspd

@AndyDPesspd

Can you share your synset.txt file? You can debug the SynsetLoader code and see why you get this text: https://github.com/deepjavalibrary/djl/blob/master/api/src/main/java/ai/djl/modality/cv/translator/BaseImageTranslator.java#L305

frankfliu avatar Oct 19 '23 14:10 frankfliu

@AndyDPesspd

Can you share your synset.txt file? You can debug the SynsetLoader code and see why you get this text: https://github.com/deepjavalibrary/djl/blob/master/api/src/main/java/ai/djl/modality/cv/translator/BaseImageTranslator.java#L305

@frankfliu Here is my txt file.

synset.txt

AndyDPesspd avatar Oct 20 '23 02:10 AndyDPesspd

@AndyDPesspd

The sysnset.txt looks fine. Can you create a mini reproduce project that I can test it?

If you only have two classes, you can just set classes directly:

        YoloV5Translator translator = YoloV5Translator.builder()
                .addTransform(new Resize(640, 640))
                .optRescaleSize(640, 640)
                .optApplyRatio(true)
                .optThreshold(0.4f)
                .optSynset(Arrays.asList("normal", "defect"))
                .build();
        Criteria<Image, DetectedObjects> criteria = Criteria.builder()
                .setTypes(Image.class, DetectedObjects.class)
                .optModelPath(path)
                .optTranslator(translator)
                .optEngine("PyTorch")
                .build();

frankfliu avatar Oct 20 '23 04:10 frankfliu

@AndyDPesspd

The sysnset.txt looks fine. Can you create a mini reproduce project that I can test it?

If you only have two classes, you can just set classes directly:

        YoloV5Translator translator = YoloV5Translator.builder()
                .addTransform(new Resize(640, 640))
                .optRescaleSize(640, 640)
                .optApplyRatio(true)
                .optThreshold(0.4f)
                .optSynset(Arrays.asList("normal", "defect"))
                .build();
        Criteria<Image, DetectedObjects> criteria = Criteria.builder()
                .setTypes(Image.class, DetectedObjects.class)
                .optModelPath(path)
                .optTranslator(translator)
                .optEngine("PyTorch")
                .build();

@frankfliu

That's my initial code. However, memory leakage occurs when oepnCV is used when this Translator is used. If openCV is not used, the precision decreases. Okay, I'll try to understand SynsetLoader from the source code and debug it, thanks.

AndyDPesspd avatar Oct 24 '23 08:10 AndyDPesspd

@AndyDPesspd The sysnset.txt looks fine. Can you create a mini reproduce project that I can test it? If you only have two classes, you can just set classes directly:

        YoloV5Translator translator = YoloV5Translator.builder()
                .addTransform(new Resize(640, 640))
                .optRescaleSize(640, 640)
                .optApplyRatio(true)
                .optThreshold(0.4f)
                .optSynset(Arrays.asList("normal", "defect"))
                .build();
        Criteria<Image, DetectedObjects> criteria = Criteria.builder()
                .setTypes(Image.class, DetectedObjects.class)
                .optModelPath(path)
                .optTranslator(translator)
                .optEngine("PyTorch")
                .build();

@frankfliu

That's my initial code. However, memory leakage occurs when oepnCV is used when this Translator is used. If openCV is not used, the precision decreases. Okay, I'll try to understand SynsetLoader from the source code and debug it, thanks.

@AndyDPesspd

The sysnset.txt looks fine. Can you create a mini reproduce project that I can test it?

If you only have two classes, you can just set classes directly:

        YoloV5Translator translator = YoloV5Translator.builder()
                .addTransform(new Resize(640, 640))
                .optRescaleSize(640, 640)
                .optApplyRatio(true)
                .optThreshold(0.4f)
                .optSynset(Arrays.asList("normal", "defect"))
                .build();
        Criteria<Image, DetectedObjects> criteria = Criteria.builder()
                .setTypes(Image.class, DetectedObjects.class)
                .optModelPath(path)
                .optTranslator(translator)
                .optEngine("PyTorch")
                .build();

@frankfliu Thank you very much. I've figured out why, but I don't know how to change it, because that's your default setting.

If you jump inside the SynsetLoader, the following method is actually called:

model.getArtifact(synsetFileName, Utils::readLines)

However, readLines(is, false) method is used by default.

And because “trim parameter == False” in readLines, blank lines are counted according to the notes.

image

So the question is, how do you change your default settings?

AndyDPesspd avatar Oct 30 '23 02:10 AndyDPesspd