Grounded-Segment-Anything icon indicating copy to clipboard operation
Grounded-Segment-Anything copied to clipboard

Different behaviour for same prompt .

Open TestPrab opened this issue 1 year ago • 2 comments

I tried to load the grounding dino model as mentioned in the notebook , and I had given multiple prompts as input the prompt was "cycle,car,person,traffic light" , for which for some reason it detected only cycle and was not able to detect anything else, but when I individually gave cycle, car ,person it was able to detect all I am attaching the images below

Prompt Input

TEXT_PROMPT = "cycle"
BOX_TRESHOLD = 0.25
TEXT_TRESHOLD = 0.25

image_source, image = load_image(local_image_path) 

output of boxes 
tensor([[0.1187, 0.5970, 0.2354, 0.3133]]) 

Output : image


for prompt car

TEXT_PROMPT = "car"
BOX_TRESHOLD = 0.25
TEXT_TRESHOLD = 0.25

image_source, image = load_image(local_image_path)

boxes output
tensor([[0.9301, 0.4329, 0.1382, 0.2682],
        [0.3839, 0.3882, 0.2696, 0.2441],
        [0.7280, 0.3345, 0.4953, 0.4068]])

image


But when I give, the combination of all three

TEXT_PROMPT = "car,person,cycle"
BOX_TRESHOLD = 0.25
TEXT_TRESHOLD = 0.25

image_source, image = load_image(local_image_path)

boxes value
tensor([[0.3826, 0.3880, 0.2707, 0.2466]]) 

image

Change it and give person first

TEXT_PROMPT = "person,car,cycle"
BOX_TRESHOLD = 0.25
TEXT_TRESHOLD = 0.25

image_source, image = load_image(local_image_path) 

boxes value
tensor([[0.3775, 0.5203, 0.1755, 0.7450]])

image


I hope the error can be replicated, I am putting the other scenario in comments

TestPrab avatar Apr 21 '23 11:04 TestPrab

Next I tried to load the model as mentioned in the notebook , and here the results were good for example the combined prompt results were as shown

TEXT_PROMPT = "person,car,cycle"
BOX_TRESHOLD = 0.25
TEXT_TRESHOLD = 0.25

image_source, image = load_image(local_image_path)

boxes value
tensor([[0.3785, 0.5197, 0.1796, 0.7477],
        [0.0833, 0.5734, 0.1632, 0.6694],
        [0.1197, 0.5559, 0.2346, 0.7366],
        [0.1201, 0.5977, 0.2360, 0.3183],
        [0.3807, 0.3886, 0.2650, 0.2493]])

image

Everything was kept same except the loading of model and the results were quite different

TestPrab avatar Apr 21 '23 11:04 TestPrab

Have you tested replacing the , into .? like this TEXT_PROMPT = "person . car . cycle"

aixiaodewugege avatar May 05 '23 03:05 aixiaodewugege

No, I tested it by using , as a separator only.

TestPrab avatar May 11 '23 11:05 TestPrab

I find similar behavior from the model even with '.' as a separator.

devadathj avatar May 31 '23 05:05 devadathj