GroundingDINO It does not work at all :(

trafficstars

My command:

CUDA_VISIBLE_DEVICES=0 python /workspace/GroundingDINO/demo/inference_on_a_image.py \
  -c /workspace/GroundingDINO/groundingdino/config/GroundingDINO_SwinB.cfg.py \
  -p /workspace/groundingdino_swinb_cogcoor.pth \
  -i /workspace/1e72fd7-1hordon-690.png \
  -o "outputs/0" \
  -t "cat ear."

Input image:

Output:

Apr 11 '23 11:04 kopyl

No, it works, but in some strange unpredicted way. It would be good to have docs...

Apr 11 '23 12:04 kopyl

Also:

This is not a fork... How does it even work?

Apr 11 '23 12:04 kopyl

@SlongLiu @SkalskiP @rentainhe @GeorgePearse

Could you please tell me whether there is any way to fine-tune these models on my own data?

Or how to train them from scratch unless it's very expensive (over $1000 on GPU)

Apr 11 '23 14:04 kopyl

Hey @kopyl with the first. The model would still likely score more highly for 'human ear' than 'cat ear', and you can process that with something like NMS.

For the second, that's a pretty non-standard image, worth trying 'cartoon owl drawing' and seeing if it scores more highly than 'fork' though. It is always the case that models need to be tuned / thresholds need to be set for different classes.

Fair point that the technique maybe struggles with 'false positives' but it is an openset detector, think of it more as trying to find examples of the text you've given it, than as a classifier. If you want better behaviour, give the model alternatives, would recommend @SkalskiP 's youtube tutorial https://www.youtube.com/watch?v=cMa77r3YrDk

Apr 11 '23 18:04 GeorgePearse

@GeorgePearse, thanks a lot for the referral 🙏

Apr 11 '23 20:04 SkalskiP

@GeorgePearse thanks :))

Apr 11 '23 23:04 kopyl

GroundingDINO GroundingDINO copied to clipboard

It does not work at all :(

GroundingDINO
GroundingDINO copied to clipboard