ultralytics icon indicating copy to clipboard operation
ultralytics copied to clipboard

pre-processing

Open Ellohiye opened this issue 8 months ago • 118 comments

Search before asking

  • [x] I have searched the Ultralytics YOLO issues and discussions and found no similar questions.

Question

How does Yolov8 load image data? What steps does the pre-processing process include?

Additional

No response

Ellohiye avatar Jul 25 '25 16:07 Ellohiye

👋 Hello @Ellohiye, thank you for your interest in Ultralytics 🚀! We recommend a visit to the Docs for new users, where you can find many Python and CLI usage examples—many common questions may already be answered there.

Since this is a ❓ question about pre-processing, please provide as much detail as possible about your dataset, any custom modifications, and your specific workflow if you need a deeper dive. For custom training, check out our Tips for Best Training Results.

Join the Ultralytics community where it suits you best! For real-time chat, head to Discord 🎧. Prefer in-depth discussions? Try Discourse. Or join our Subreddit to share knowledge.

Upgrade

Upgrade to the latest ultralytics package including all requirements in a Python>=3.8 environment with PyTorch>=1.8 to verify your issue is not already resolved in the latest version:

pip install -U ultralytics

Environments

YOLO can be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

Ultralytics CI

If this badge is green, all Ultralytics CI tests are currently passing. CI tests verify correct operation of all YOLO Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit.


This is an automated response; an Ultralytics engineer will also assist you here soon 😊

UltralyticsAssistant avatar Jul 25 '25 16:07 UltralyticsAssistant

You can see the pre-processing used here:

https://github.com/ultralytics/ultralytics/blob/main/examples/YOLOv8-ONNXRuntime/main.py

Y-T-G avatar Jul 25 '25 16:07 Y-T-G

This is the code for pre-processing in C++ code.

Image

It is found that the engine model detection accuracy obtained in C++ code is different from the model detection accuracy obtained in Python code. Is it a problem with this pre-processing code?

Ellohiye avatar Jul 26 '25 01:07 Ellohiye

It's not correct You can find C+( preprocessing here: https://github.com/ultralytics/ultralytics/blob/main/examples/YOLOv8-LibTorch-CPP-Inference/main.cc

Y-T-G avatar Jul 26 '25 06:07 Y-T-G

I use Tensor accelerated models, not libtorch

Ellohiye avatar Jul 26 '25 07:07 Ellohiye

Preprocessing is the same

Y-T-G avatar Jul 26 '25 08:07 Y-T-G

The python side uses interpolation=cv2.INTER_LINEAR for letterbox, but here uses cv::INTER_AREA. Is there a deviation in the test accuracy? Do you have C++ processing code for the accelerated tensorrt model? The libtorch code uses tensor for inference, but the processing method for the tensorrt model is different. Now the accuracy cannot be aligned with the python side, which is very troublesome.

Ellohiye avatar Jul 28 '25 16:07 Ellohiye

Yes, using different interpolation methods (cv2.INTER_LINEAR vs cv::INTER_AREA) can cause accuracy deviations. For consistency with Python preprocessing, use cv::INTER_LINEAR in your C++ letterbox implementation. The preprocess method shows the Python pipeline: letterbox resize, BGR→RGB conversion, normalization to 0-1 range, and BHWC→BCHW transpose. Ensure your TensorRT C++ preprocessing follows these exact steps with matching interpolation for accuracy alignment.

glenn-jocher avatar Jul 29 '25 02:07 glenn-jocher

Image The content sent to inference is input_blob, which is obtained by normalization. This is the model inference process after tensorrt acceleration written by other developers on the Internet. I cannot align the accuracy of the python side. Do you have C++ processing code for the accelerated tensorrt model? Image

Ellohiye avatar Jul 29 '25 03:07 Ellohiye

We don't have official TensorRT C++ preprocessing examples in the Ultralytics repository. However, your preprocessing must exactly match the Python pipeline: ensure you're using cv::INTER_LINEAR interpolation, applying letterbox padding with gray fill (114, 114, 114), converting BGR→RGB, normalizing by dividing by 255.0, and transposing from HWC to CHW format. The normalization step in your code should divide pixel values by 255.0, not subtract mean/std values, as YOLO models expect 0-1 normalized inputs.

glenn-jocher avatar Jul 30 '25 00:07 glenn-jocher

This is what I do in my code, because there is no need to go from 4096 to 640, and no additional padding is required; and normalization and RGB conversion are also performed. Is there a problem?

Ellohiye avatar Jul 30 '25 01:07 Ellohiye

The issue might be that you're skipping the letterbox preprocessing step entirely. Even if your input resolution matches the model's expected input size, YOLO models are typically trained with letterbox preprocessing that maintains aspect ratio and adds padding. If you skip this step during inference, it can cause accuracy degradation. The preprocess method calls self.pre_transform(im) which includes letterbox transformation - you should implement this same letterbox logic in your C++ code even when no resizing is needed to ensure identical preprocessing.

glenn-jocher avatar Jul 30 '25 06:07 glenn-jocher

I used the letterbox method on this website, (https://github.com/ultralytics/ultralytics/blob/main/examples/YOLOv8-LibTorch-CPP-Inference/main.cc)but after testing, it was found that there was no padding, only resizing. Shouldn't that be the reason?

Ellohiye avatar Jul 30 '25 08:07 Ellohiye

That could definitely be the issue. The letterbox function should add padding when the input aspect ratio doesn't match the target aspect ratio. If your letterbox implementation is only resizing without padding, it's distorting the image which will cause accuracy degradation. Check that your letterbox function correctly calculates the scale factor to maintain aspect ratio and adds gray padding (value 114) to fill the remaining space. You can verify this by printing the image dimensions before and after letterbox - if the aspect ratio changes, your implementation needs fixing.

glenn-jocher avatar Jul 31 '25 08:07 glenn-jocher

Does the aspect ratio not match when adjusting the image size from 4096 * 4096 to 640 * 640? I am using the official code provided by you, without any modifications, so it is impossible for any problems to occur

Ellohiye avatar Jul 31 '25 10:07 Ellohiye

Image The content sent to inference is input_blob, which is obtained by normalization. This is the model inference process after tensorrt acceleration written by other developers on the Internet. I cannot align the accuracy of the python side. Do you have C++ processing code for the accelerated tensorrt model? Image

Python inference runs with minimum rectangle padding. You can run with rect=False in Python to run with square padding and then compare the output.

Y-T-G avatar Jul 31 '25 11:07 Y-T-G

I have been conducting comparative experiments with the parameter setting of rect=false, which should not be the problem

Ellohiye avatar Jul 31 '25 14:07 Ellohiye

Are you using rect=False in model.predict()

Y-T-G avatar Jul 31 '25 15:07 Y-T-G

yes

Ellohiye avatar Aug 01 '25 01:08 Ellohiye

Since you're using square images (4096x4096 → 640x640) with rect=False, no padding should be needed and the letterbox behavior you're seeing is correct. The accuracy difference might be due to data type precision or normalization order. Ensure your C++ code uses float32 precision throughout the pipeline and normalizes by exactly 255.0 (not 255). Also verify the BGR→RGB conversion is happening at the right step - Python does this after letterbox but before normalization, so your C++ code should follow the same sequence.

glenn-jocher avatar Aug 01 '25 10:08 glenn-jocher

I followed this order in my C++code and normalized it to 255.0f. I really don't know where else the problem might occur

Ellohiye avatar Aug 01 '25 12:08 Ellohiye

There is no problem with the inference model. I tested the engine model in Python and the results were consistent with the accuracy of the Python detection

Ellohiye avatar Aug 01 '25 12:08 Ellohiye

Since your TensorRT engine works correctly in Python, the issue is definitely in your C++ preprocessing implementation. Try these debugging steps: print the exact pixel values after each preprocessing step (letterbox, BGR→RGB, normalization) and compare them with Python outputs using the same input image. Also ensure you're using float32 data types throughout and that memory is laid out correctly for TensorRT (typically NCHW format). Small differences in floating-point precision or memory stride can cause accuracy deviations even when the preprocessing logic appears correct.

glenn-jocher avatar Aug 01 '25 23:08 glenn-jocher

ImageThis is my entire pre-processing step. I wonder if you can help me see where the code needs to be modified?

Ellohiye avatar Aug 02 '25 01:08 Ellohiye

You need to use the LetterBox resizing that I linked. Your resizing method is wrong.

Y-T-G avatar Aug 02 '25 06:08 Y-T-G

I have tested it with the link you gave me, and I have replied before. The results of the official letterbox test are exactly the same as the results of the direct resize.

Ellohiye avatar Aug 02 '25 09:08 Ellohiye

What's the original size of the image?

Y-T-G avatar Aug 02 '25 10:08 Y-T-G

4096*4096

Ellohiye avatar Aug 02 '25 14:08 Ellohiye

Since it's already square, there would be no padding. So it would be similar to direct resize.

Y-T-G avatar Aug 02 '25 16:08 Y-T-G

Yes, so I think there should be no problem with the pre-processing method in my code

Ellohiye avatar Aug 03 '25 00:08 Ellohiye