maintain high resolution for object detection segmentation
Search before asking
- [x] I have searched the YOLOv5 issues and discussions and found no similar questions.
Question
My current problem is when I input an image with a resolution of 20mp, then during the inference process Yolo will decrease the resolution. is there a method to maintain the number of pixels of the 20 mp input image for object detection using Yolo? because I want to detect rice fields with segmentation and calculate the area of rice fields (segmentation results) based on the number of image pixels.
Additional
No response
👋 Hello @pepsodent72, thank you for your interest in YOLOv5 🚀! Maintaining high-resolution image inputs for tasks like object detection and segmentation is an important topic. Please explore our ⭐️ Tutorials for guidance, including Custom Data Training and Tips for Best Training Results.
If this is a ❓ Question, please provide additional details about your use case, such as:
- Specific YOLO model or configuration being used
- Any changes made to preprocessing or inference settings
- Example images or logs that illustrate your issue
If this is a 🐛 Bug Report, we kindly request a minimum reproducible example (MRE) to help us investigate efficiently.
Requirements
Ensure your environment meets the following: Python>=3.8.0 with all requirements.txt installed, including PyTorch>=1.8. To set up:
git clone https://github.com/ultralytics/yolov5 # clone
cd yolov5
pip install -r requirements.txt # install
Environments
You can run YOLOv5 in any of the following verified environments:
- Notebooks with free GPU:
- Google Cloud Deep Learning VM: See GCP Quickstart Guide
- Amazon Deep Learning AMI: See AWS Quickstart Guide
- Docker Image: See Docker Quickstart Guide
Status
If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests validate YOLOv5 training, validation, inference, export, and benchmarks.
This is an automated response to help guide you 😊. An Ultralytics engineer will review your issue and provide assistance soon. Thank you for your patience and for using YOLOv5! 🚀
@pepsodent72 to maintain high-resolution segmentation for area calculation in YOLOv5, we recommend two approaches:
- Native Inference: Run inference at full resolution by setting
--imgszto your original image dimensions (e.g.,--imgsz 5472 3648for 20MP). This requires sufficient GPU memory. Modify your inference command:
python detect.py --weights your_model.pt --source your_image.jpg --imgsz 5472 3648
- Sliding Window Inference: Process the image in tiles using a sliding window approach with
--confand--overlapadjustments to maintain precision. This is memory-efficient for large images:
from models.common import DetectMultiBackend
model = DetectMultiBackend('your_model.pt')
results = model(source, imgsz=img_size, stride=64, auto=False, augment=False)
For area calculations, ensure your images have geospatial metadata (ground sampling distance) to convert pixel counts to real-world areas. The Ultralytics HUB offers built-in geospatial tools when working with satellite/drone imagery.
For implementation details, see:
- Detection parameters in detect.py
- Sliding window guidance in YOLOv5 documentation
Let us know if you need further assistance with your rice field segmentation project! 🌾
if I do segmentation detection on a 20mp input image, what is my training model? does it remain 640x640?
then is the sliding window accurate if it detects large objects such as rice fields? and if there are objects detected in frame 1 and 2? can the sliding window merge the detection results?
thank you hope you are always healthy
For high-resolution 20MP segmentation:
- Training: Default is 640x640, but you can train at higher resolutions (e.g.,
--img 1280) to better match your inference size. This improves detail retention but requires more GPU memory. - Sliding Window: YOLOv5's sliding window (
stride/overlapsettings) works well for large objects like rice fields. Overlapping tiles (e.g.,--overlap 0.5) prevent edge misses, and the built-in NMS merges duplicate detections across tiles automatically.
For implementation details see the YOLOv5 segmentation tutorial and instance segmentation guide. Let us know if you need further clarification! 🌱
ask permission kak want to ask, is there a recommendation for tile? good which sliding window or SAHI for the detection of rice fields whose objects are quite large?
and I want to ask about something else. for example when the tile process divides 1 image into 9 frames. for example there are objects in frames 1 2 and 4 how? can the model still be recombined?
For large rice field detection in YOLOv5, we recommend using the native sliding window approach with stride and imgsz parameters. YOLOv5 automatically handles overlapping detections across tiles with NMS (non-maximum suppression) to merge results. For example:
python detect.py --source your_image.jpg --imgsz 2048 --stride 128 # adjust stride/overlap via imgsz-stride ratio
This works best for large objects as overlapping tiles ensure full coverage. SAHI is an alternative but YOLOv5's built-in tiling is optimized for our architecture. Detections across tiles are combined before final output. See YOLOv5 segmentation docs for more details.
I have a question
I have an input image with a resolution of 2365x1447px. and I have two cases:
- when I set imgsz to [640], then when entering the inference process the image is resized to [640x416].
- when I set imgsz to [2365,1447], then when entering the inference process the resized image becomes [1472,928].
- when I set imgsz to [4000,3000], then when entering the inference process, the resized image becomes [3008,1856].
from case number 3 means that my laptop can resize and process image [3008,1856] let alone just input [2356,1447]. the question is can I keep the input image that has a value of [2365,1447] when entering the inference process still has the same number of pixels?
YOLOv5 requires input dimensions to be multiples of the model stride (default=32) for compatibility with the architecture. To maintain your original resolution (2365x1447), first verify it's divisible by 32:
from utils.general import check_img_size
img_size = [2365, 1447]
img_size = check_img_size(img_size, s=32) # returns valid stride-compatible dimensions
If exact pixel preservation is critical, use these validated dimensions and crop outputs post-inference. The model will process at the nearest valid size then you can align results to your original resolution.
I have set img_size = [2365, 1447], but yolo still resizes it to a smaller size, namely [1472,928]. how do I solve it?
YOLOv5 requires input dimensions to be multiples of the model stride (64 for YOLOv5). Use check_img_size() to validate your size:
from utils.general import check_img_size
img_size = check_img_size([2365, 1447], s=64) # returns valid stride-compatible size
If exact pixel counts are critical, pad/crop post-inference to match your original resolution.
👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.
For additional resources and information, please see the links below:
- Docs: https://docs.ultralytics.com
- HUB: https://hub.ultralytics.com
- Community: https://community.ultralytics.com
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!
Thank you for your contributions to YOLO 🚀 and Vision AI ⭐