Multi-object caption has negative effect on detection results.

Open hotelll opened this issue 1 year ago • 1 comments

I am using GroundingDINO to detect object from image. However, I found that an object can be found with caption "ping pong.", but cannot be found with caption "man. ping pong.". The results are as follows:

caption: "ping pong" box_threshold=0.3
caption: "man. ping pong." box_threshold=0.3
caption: "man. ping pong." box_threshold=0.2

I wonder why this happened, and how to solve/ease this issue? Thanks!

May 08 '24 11:05 hotelll

I am having similar issues, have anyone found the solution?

Feb 21 '25 03:02 yunbinmo