ImageBind
ImageBind copied to clipboard
ImageBind with SAM Simple Demo: Segment with Different Modalities
Thanks a lot for release such an amazing work!
We implement a simple and interesting demo by combing ImageBind with SAM here: ImageBind-SAM which can segment things with different modalities, and the project is still under develop
This basic idea is followed with IEA: Image Editing Anything and CLIP-SAM which generate the referring mask with the following steps:
- Step 1: Generate auto masks with
SamAutomaticMaskGenerator
- Step 2: Crop all the box region from the masks
- Step 3: Compute the similarity with cropped images and different modalities
- Step 4: Merge the highest similarity mask region
And the result is shown as:
Input Model | Modality | Generate Mask |
---|---|---|
car audio | ||
"A car" |
And the threshold for each box will influence a lot on the final result, we will do more test on it!