GroundingDINO icon indicating copy to clipboard operation
GroundingDINO copied to clipboard

It does not work at all :(

Open kopyl opened this issue 2 years ago • 6 comments
trafficstars

My command:

CUDA_VISIBLE_DEVICES=0 python /workspace/GroundingDINO/demo/inference_on_a_image.py \
  -c /workspace/GroundingDINO/groundingdino/config/GroundingDINO_SwinB.cfg.py \
  -p /workspace/groundingdino_swinb_cogcoor.pth \
  -i /workspace/1e72fd7-1hordon-690.png \
  -o "outputs/0" \
  -t "cat ear." 

Input image: image

Output: image

kopyl avatar Apr 11 '23 11:04 kopyl

No, it works, but in some strange unpredicted way. It would be good to have docs...

kopyl avatar Apr 11 '23 12:04 kopyl

Also:

image

This is not a fork... How does it even work?

kopyl avatar Apr 11 '23 12:04 kopyl

@SlongLiu @SkalskiP @rentainhe @GeorgePearse

Could you please tell me whether there is any way to fine-tune these models on my own data?

Or how to train them from scratch unless it's very expensive (over $1000 on GPU)

kopyl avatar Apr 11 '23 14:04 kopyl

Hey @kopyl with the first. The model would still likely score more highly for 'human ear' than 'cat ear', and you can process that with something like NMS.

For the second, that's a pretty non-standard image, worth trying 'cartoon owl drawing' and seeing if it scores more highly than 'fork' though. It is always the case that models need to be tuned / thresholds need to be set for different classes.

Fair point that the technique maybe struggles with 'false positives' but it is an openset detector, think of it more as trying to find examples of the text you've given it, than as a classifier. If you want better behaviour, give the model alternatives, would recommend @SkalskiP 's youtube tutorial https://www.youtube.com/watch?v=cMa77r3YrDk

GeorgePearse avatar Apr 11 '23 18:04 GeorgePearse

@GeorgePearse, thanks a lot for the referral 🙏

SkalskiP avatar Apr 11 '23 20:04 SkalskiP

@GeorgePearse thanks :))

kopyl avatar Apr 11 '23 23:04 kopyl