Does YOLO-World support complex queries for object detection?

Open loucif01 opened this issue 11 months ago • 1 comments

Hello YOLO-World team,

I’m working on a project where I need to detect and describe objects in images using complex queries (e.g., "a building with a damaged roof and broken windows" or "a road completely submerged in water"). I’m considering using YOLO-World for this task and would like to confirm if the model supports such complex queries.

Specifically:

Can YOLO-World handle natural language prompts that describe multiple attributes of an object (e.g., "a damaged roof with broken windows")?
Does it support paragraph-level descriptions for object detection (e.g., "a flooded road with submerged vehicles and debris")?
Are there any limitations on the complexity or length of the text prompts?

If YOLO-World does not natively support complex queries, are there any recommended approaches or fine-tuning strategies to achieve this functionality?

Thank you for your time and assistance!

Best regards,

Jan 14 '25 03:01 loucif01

Seem to handle REC task via YOLO-World?

Jul 26 '25 14:07 connorye