RefSAM how's the performance on refcoco?

how's the performance on refcoco?

Open yxchng opened this issue 2 years ago • 2 comments

Apr 07 '23 03:04 yxchng

It is taking a while to run, I'll probably check the results sometime during the weekend.

The initial results of this approach are fairly poor. I think the reason for this is that many of the RefCOCO text prompts involve spatial relations like "the man to the left of the ...". CLIP does not have the ability to contextualize local regions within an image.

Apr 07 '23 03:04 helblazer811

Hello, I also utilize the clip model to classify the masks from SAM. However, I find the performance is poor. Increasing the image size of the clip model may improve the recognition accuracy of each mask.

Apr 10 '23 02:04 PengtaoJiang

RefSAM RefSAM copied to clipboard

how's the performance on refcoco?

RefSAM
RefSAM copied to clipboard