YOLO-World icon indicating copy to clipboard operation
YOLO-World copied to clipboard

Does YOLO-World support complex queries for object detection?

Open loucif01 opened this issue 11 months ago • 1 comments

Hello YOLO-World team,

I’m working on a project where I need to detect and describe objects in images using complex queries (e.g., "a building with a damaged roof and broken windows" or "a road completely submerged in water"). I’m considering using YOLO-World for this task and would like to confirm if the model supports such complex queries.

Specifically:

  1. Can YOLO-World handle natural language prompts that describe multiple attributes of an object (e.g., "a damaged roof with broken windows")?
  2. Does it support paragraph-level descriptions for object detection (e.g., "a flooded road with submerged vehicles and debris")?
  3. Are there any limitations on the complexity or length of the text prompts?

If YOLO-World does not natively support complex queries, are there any recommended approaches or fine-tuning strategies to achieve this functionality?

Thank you for your time and assistance!

Best regards,

loucif01 avatar Jan 14 '25 03:01 loucif01

Seem to handle REC task via YOLO-World?

connorye avatar Jul 26 '25 14:07 connorye