RefSAM
RefSAM copied to clipboard
how's the performance on refcoco?
It is taking a while to run, I'll probably check the results sometime during the weekend.
The initial results of this approach are fairly poor. I think the reason for this is that many of the RefCOCO text prompts involve spatial relations like "the man to the left of the ...". CLIP does not have the ability to contextualize local regions within an image.
Hello, I also utilize the clip model to classify the masks from SAM. However, I find the performance is poor. Increasing the image size of the clip model may improve the recognition accuracy of each mask.