GroundingDINO
GroundingDINO copied to clipboard
It does not work at all :(
My command:
CUDA_VISIBLE_DEVICES=0 python /workspace/GroundingDINO/demo/inference_on_a_image.py \
-c /workspace/GroundingDINO/groundingdino/config/GroundingDINO_SwinB.cfg.py \
-p /workspace/groundingdino_swinb_cogcoor.pth \
-i /workspace/1e72fd7-1hordon-690.png \
-o "outputs/0" \
-t "cat ear."
Input image:

Output:

No, it works, but in some strange unpredicted way. It would be good to have docs...
Also:

This is not a fork... How does it even work?
@SlongLiu @SkalskiP @rentainhe @GeorgePearse
Could you please tell me whether there is any way to fine-tune these models on my own data?
Or how to train them from scratch unless it's very expensive (over $1000 on GPU)
Hey @kopyl with the first. The model would still likely score more highly for 'human ear' than 'cat ear', and you can process that with something like NMS.
For the second, that's a pretty non-standard image, worth trying 'cartoon owl drawing' and seeing if it scores more highly than 'fork' though. It is always the case that models need to be tuned / thresholds need to be set for different classes.
Fair point that the technique maybe struggles with 'false positives' but it is an openset detector, think of it more as trying to find examples of the text you've given it, than as a classifier. If you want better behaviour, give the model alternatives, would recommend @SkalskiP 's youtube tutorial https://www.youtube.com/watch?v=cMa77r3YrDk
@GeorgePearse, thanks a lot for the referral 🙏
@GeorgePearse thanks :))