GroundingDINO
GroundingDINO copied to clipboard
Multi-object caption has negative effect on detection results.
I am using GroundingDINO to detect object from image. However, I found that an object can be found with caption "ping pong.", but cannot be found with caption "man. ping pong.". The results are as follows:
-
caption: "ping pong" box_threshold=0.3
-
caption: "man. ping pong." box_threshold=0.3
-
caption: "man. ping pong." box_threshold=0.2
I wonder why this happened, and how to solve/ease this issue? Thanks!
I am having similar issues, have anyone found the solution?