pre-processing
Search before asking
- [x] I have searched the Ultralytics YOLO issues and discussions and found no similar questions.
Question
How does Yolov8 load image data? What steps does the pre-processing process include?
Additional
No response
👋 Hello @Ellohiye, thank you for your interest in Ultralytics 🚀! We recommend a visit to the Docs for new users, where you can find many Python and CLI usage examples—many common questions may already be answered there.
Since this is a ❓ question about pre-processing, please provide as much detail as possible about your dataset, any custom modifications, and your specific workflow if you need a deeper dive. For custom training, check out our Tips for Best Training Results.
Join the Ultralytics community where it suits you best! For real-time chat, head to Discord 🎧. Prefer in-depth discussions? Try Discourse. Or join our Subreddit to share knowledge.
Upgrade
Upgrade to the latest ultralytics package including all requirements in a Python>=3.8 environment with PyTorch>=1.8 to verify your issue is not already resolved in the latest version:
pip install -U ultralytics
Environments
YOLO can be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
-
Notebooks with free GPU:
- Google Cloud Deep Learning VM. See GCP Quickstart Guide
- Amazon Deep Learning AMI. See AWS Quickstart Guide
-
Docker Image. See Docker Quickstart Guide
Status
If this badge is green, all Ultralytics CI tests are currently passing. CI tests verify correct operation of all YOLO Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit.
This is an automated response; an Ultralytics engineer will also assist you here soon 😊
You can see the pre-processing used here:
https://github.com/ultralytics/ultralytics/blob/main/examples/YOLOv8-ONNXRuntime/main.py
This is the code for pre-processing in C++ code.
It is found that the engine model detection accuracy obtained in C++ code is different from the model detection accuracy obtained in Python code. Is it a problem with this pre-processing code?
It's not correct You can find C+( preprocessing here: https://github.com/ultralytics/ultralytics/blob/main/examples/YOLOv8-LibTorch-CPP-Inference/main.cc
I use Tensor accelerated models, not libtorch
Preprocessing is the same
The python side uses interpolation=cv2.INTER_LINEAR for letterbox, but here uses cv::INTER_AREA. Is there a deviation in the test accuracy? Do you have C++ processing code for the accelerated tensorrt model? The libtorch code uses tensor for inference, but the processing method for the tensorrt model is different. Now the accuracy cannot be aligned with the python side, which is very troublesome.
Yes, using different interpolation methods (cv2.INTER_LINEAR vs cv::INTER_AREA) can cause accuracy deviations. For consistency with Python preprocessing, use cv::INTER_LINEAR in your C++ letterbox implementation. The preprocess method shows the Python pipeline: letterbox resize, BGR→RGB conversion, normalization to 0-1 range, and BHWC→BCHW transpose. Ensure your TensorRT C++ preprocessing follows these exact steps with matching interpolation for accuracy alignment.
We don't have official TensorRT C++ preprocessing examples in the Ultralytics repository. However, your preprocessing must exactly match the Python pipeline: ensure you're using cv::INTER_LINEAR interpolation, applying letterbox padding with gray fill (114, 114, 114), converting BGR→RGB, normalizing by dividing by 255.0, and transposing from HWC to CHW format. The normalization step in your code should divide pixel values by 255.0, not subtract mean/std values, as YOLO models expect 0-1 normalized inputs.
This is what I do in my code, because there is no need to go from 4096 to 640, and no additional padding is required; and normalization and RGB conversion are also performed. Is there a problem?
The issue might be that you're skipping the letterbox preprocessing step entirely. Even if your input resolution matches the model's expected input size, YOLO models are typically trained with letterbox preprocessing that maintains aspect ratio and adds padding. If you skip this step during inference, it can cause accuracy degradation. The preprocess method calls self.pre_transform(im) which includes letterbox transformation - you should implement this same letterbox logic in your C++ code even when no resizing is needed to ensure identical preprocessing.
I used the letterbox method on this website, (https://github.com/ultralytics/ultralytics/blob/main/examples/YOLOv8-LibTorch-CPP-Inference/main.cc)but after testing, it was found that there was no padding, only resizing. Shouldn't that be the reason?
That could definitely be the issue. The letterbox function should add padding when the input aspect ratio doesn't match the target aspect ratio. If your letterbox implementation is only resizing without padding, it's distorting the image which will cause accuracy degradation. Check that your letterbox function correctly calculates the scale factor to maintain aspect ratio and adds gray padding (value 114) to fill the remaining space. You can verify this by printing the image dimensions before and after letterbox - if the aspect ratio changes, your implementation needs fixing.
Does the aspect ratio not match when adjusting the image size from 4096 * 4096 to 640 * 640? I am using the official code provided by you, without any modifications, so it is impossible for any problems to occur
The content sent to inference is input_blob, which is obtained by normalization. This is the model inference process after tensorrt acceleration written by other developers on the Internet. I cannot align the accuracy of the python side. Do you have C++ processing code for the accelerated tensorrt model?
![]()
Python inference runs with minimum rectangle padding. You can run with rect=False in Python to run with square padding and then compare the output.
I have been conducting comparative experiments with the parameter setting of rect=false, which should not be the problem
Are you using rect=False in model.predict()
yes
Since you're using square images (4096x4096 → 640x640) with rect=False, no padding should be needed and the letterbox behavior you're seeing is correct. The accuracy difference might be due to data type precision or normalization order. Ensure your C++ code uses float32 precision throughout the pipeline and normalizes by exactly 255.0 (not 255). Also verify the BGR→RGB conversion is happening at the right step - Python does this after letterbox but before normalization, so your C++ code should follow the same sequence.
I followed this order in my C++code and normalized it to 255.0f. I really don't know where else the problem might occur
There is no problem with the inference model. I tested the engine model in Python and the results were consistent with the accuracy of the Python detection
Since your TensorRT engine works correctly in Python, the issue is definitely in your C++ preprocessing implementation. Try these debugging steps: print the exact pixel values after each preprocessing step (letterbox, BGR→RGB, normalization) and compare them with Python outputs using the same input image. Also ensure you're using float32 data types throughout and that memory is laid out correctly for TensorRT (typically NCHW format). Small differences in floating-point precision or memory stride can cause accuracy deviations even when the preprocessing logic appears correct.
This is my entire pre-processing step. I wonder if you can help me see where the code needs to be modified?
You need to use the LetterBox resizing that I linked. Your resizing method is wrong.
I have tested it with the link you gave me, and I have replied before. The results of the official letterbox test are exactly the same as the results of the direct resize.
What's the original size of the image?
4096*4096
Since it's already square, there would be no padding. So it would be similar to direct resize.
Yes, so I think there should be no problem with the pre-processing method in my code