Grounded-Segment-Anything
Grounded-Segment-Anything copied to clipboard
Different behaviour for same prompt .
I tried to load the grounding dino model as mentioned in the notebook , and I had given multiple prompts as input the prompt was
"cycle,car,person,traffic light"
, for which for some reason it detected only cycle and was not able to detect anything else, but when I individually gave cycle
, car
,person
it was able to detect all I am attaching the images below
Prompt Input
TEXT_PROMPT = "cycle"
BOX_TRESHOLD = 0.25
TEXT_TRESHOLD = 0.25
image_source, image = load_image(local_image_path)
output of boxes
tensor([[0.1187, 0.5970, 0.2354, 0.3133]])
Output :
for prompt car
TEXT_PROMPT = "car"
BOX_TRESHOLD = 0.25
TEXT_TRESHOLD = 0.25
image_source, image = load_image(local_image_path)
boxes output
tensor([[0.9301, 0.4329, 0.1382, 0.2682],
[0.3839, 0.3882, 0.2696, 0.2441],
[0.7280, 0.3345, 0.4953, 0.4068]])
But when I give, the combination of all three
TEXT_PROMPT = "car,person,cycle"
BOX_TRESHOLD = 0.25
TEXT_TRESHOLD = 0.25
image_source, image = load_image(local_image_path)
boxes value
tensor([[0.3826, 0.3880, 0.2707, 0.2466]])
Change it and give person
first
TEXT_PROMPT = "person,car,cycle"
BOX_TRESHOLD = 0.25
TEXT_TRESHOLD = 0.25
image_source, image = load_image(local_image_path)
boxes value
tensor([[0.3775, 0.5203, 0.1755, 0.7450]])
I hope the error can be replicated, I am putting the other scenario in comments
Next I tried to load the model as mentioned in the notebook , and here the results were good for example the combined prompt results were as shown
TEXT_PROMPT = "person,car,cycle"
BOX_TRESHOLD = 0.25
TEXT_TRESHOLD = 0.25
image_source, image = load_image(local_image_path)
boxes value
tensor([[0.3785, 0.5197, 0.1796, 0.7477],
[0.0833, 0.5734, 0.1632, 0.6694],
[0.1197, 0.5559, 0.2346, 0.7366],
[0.1201, 0.5977, 0.2360, 0.3183],
[0.3807, 0.3886, 0.2650, 0.2493]])
Everything was kept same except the loading of model and the results were quite different
Have you tested replacing the , into .? like this TEXT_PROMPT = "person . car . cycle"
No, I tested it by using ,
as a separator only.
I find similar behavior from the model even with '.' as a separator.