yolov5 maintain high resolution for object detection segmentation

Search before asking

[x] I have searched the YOLOv5 issues and discussions and found no similar questions.

Question

My current problem is when I input an image with a resolution of 20mp, then during the inference process Yolo will decrease the resolution. is there a method to maintain the number of pixels of the 20 mp input image for object detection using Yolo? because I want to detect rice fields with segmentation and calculate the area of rice fields (segmentation results) based on the number of image pixels.

Additional

No response

Mar 13 '25 03:03 dimasadef72

👋 Hello @pepsodent72, thank you for your interest in YOLOv5 🚀! Maintaining high-resolution image inputs for tasks like object detection and segmentation is an important topic. Please explore our ⭐️ Tutorials for guidance, including Custom Data Training and Tips for Best Training Results.

If this is a ❓ Question, please provide additional details about your use case, such as:

Specific YOLO model or configuration being used
Any changes made to preprocessing or inference settings
Example images or logs that illustrate your issue

If this is a 🐛 Bug Report, we kindly request a minimum reproducible example (MRE) to help us investigate efficiently.

Requirements

Ensure your environment meets the following: Python>=3.8.0 with all requirements.txt installed, including PyTorch>=1.8. To set up:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

You can run YOLOv5 in any of the following verified environments:

Notebooks with free GPU:
Google Cloud Deep Learning VM: See GCP Quickstart Guide
Amazon Deep Learning AMI: See AWS Quickstart Guide
Docker Image: See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests validate YOLOv5 training, validation, inference, export, and benchmarks.

This is an automated response to help guide you 😊. An Ultralytics engineer will review your issue and provide assistance soon. Thank you for your patience and for using YOLOv5! 🚀

Mar 13 '25 03:03 UltralyticsAssistant

@pepsodent72 to maintain high-resolution segmentation for area calculation in YOLOv5, we recommend two approaches:

Native Inference: Run inference at full resolution by setting --imgsz to your original image dimensions (e.g., --imgsz 5472 3648 for 20MP). This requires sufficient GPU memory. Modify your inference command:

python detect.py --weights your_model.pt --source your_image.jpg --imgsz 5472 3648

Sliding Window Inference: Process the image in tiles using a sliding window approach with --conf and --overlap adjustments to maintain precision. This is memory-efficient for large images:

from models.common import DetectMultiBackend
model = DetectMultiBackend('your_model.pt')
results = model(source, imgsz=img_size, stride=64, auto=False, augment=False)

For area calculations, ensure your images have geospatial metadata (ground sampling distance) to convert pixel counts to real-world areas. The Ultralytics HUB offers built-in geospatial tools when working with satellite/drone imagery.

For implementation details, see:

Detection parameters in detect.py
Sliding window guidance in YOLOv5 documentation

Let us know if you need further assistance with your rice field segmentation project! 🌾

Mar 13 '25 16:03 pderrenger

if I do segmentation detection on a 20mp input image, what is my training model? does it remain 640x640?

then is the sliding window accurate if it detects large objects such as rice fields? and if there are objects detected in frame 1 and 2? can the sliding window merge the detection results?

thank you hope you are always healthy

Mar 14 '25 02:03 dimasadef72

For high-resolution 20MP segmentation:

Training: Default is 640x640, but you can train at higher resolutions (e.g., --img 1280) to better match your inference size. This improves detail retention but requires more GPU memory.
Sliding Window: YOLOv5's sliding window (stride/overlap settings) works well for large objects like rice fields. Overlapping tiles (e.g., --overlap 0.5) prevent edge misses, and the built-in NMS merges duplicate detections across tiles automatically.

For implementation details see the YOLOv5 segmentation tutorial and instance segmentation guide. Let us know if you need further clarification! 🌱

Mar 14 '25 05:03 pderrenger

ask permission kak want to ask, is there a recommendation for tile? good which sliding window or SAHI for the detection of rice fields whose objects are quite large?

and I want to ask about something else. for example when the tile process divides 1 image into 9 frames. for example there are objects in frames 1 2 and 4 how? can the model still be recombined?

Mar 14 '25 07:03 dimasadef72

For large rice field detection in YOLOv5, we recommend using the native sliding window approach with stride and imgsz parameters. YOLOv5 automatically handles overlapping detections across tiles with NMS (non-maximum suppression) to merge results. For example:

python detect.py --source your_image.jpg --imgsz 2048 --stride 128  # adjust stride/overlap via imgsz-stride ratio

This works best for large objects as overlapping tiles ensure full coverage. SAHI is an alternative but YOLOv5's built-in tiling is optimized for our architecture. Detections across tiles are combined before final output. See YOLOv5 segmentation docs for more details.

Mar 14 '25 16:03 pderrenger

I have a question

I have an input image with a resolution of 2365x1447px. and I have two cases:

when I set imgsz to [640], then when entering the inference process the image is resized to [640x416].
when I set imgsz to [2365,1447], then when entering the inference process the resized image becomes [1472,928].
when I set imgsz to [4000,3000], then when entering the inference process, the resized image becomes [3008,1856].

from case number 3 means that my laptop can resize and process image [3008,1856] let alone just input [2356,1447]. the question is can I keep the input image that has a value of [2365,1447] when entering the inference process still has the same number of pixels?

Mar 15 '25 09:03 dimasadef72

YOLOv5 requires input dimensions to be multiples of the model stride (default=32) for compatibility with the architecture. To maintain your original resolution (2365x1447), first verify it's divisible by 32:

from utils.general import check_img_size
img_size = [2365, 1447]
img_size = check_img_size(img_size, s=32)  # returns valid stride-compatible dimensions

If exact pixel preservation is critical, use these validated dimensions and crop outputs post-inference. The model will process at the nearest valid size then you can align results to your original resolution.

Mar 16 '25 00:03 pderrenger

I have set img_size = [2365, 1447], but yolo still resizes it to a smaller size, namely [1472,928]. how do I solve it?

Mar 28 '25 06:03 dimasadef72

YOLOv5 requires input dimensions to be multiples of the model stride (64 for YOLOv5). Use check_img_size() to validate your size:

from utils.general import check_img_size
img_size = check_img_size([2365, 1447], s=64)  # returns valid stride-compatible size

If exact pixel counts are critical, pad/crop post-inference to match your original resolution.

Mar 28 '25 17:03 pderrenger

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Docs: https://docs.ultralytics.com
HUB: https://hub.ultralytics.com
Community: https://community.ultralytics.com

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

Oct 10 '25 00:10 github-actions[bot]