getting the mask of first frame without using XMem or SAM as a preprocessing
This is a followup question.
Is there a way to make FoundationPose work with only the 2D bounding box of he object of interest?
Has anyone streamlined it so that there was no need for providing the mask of the first frame as a pre-requisite?
Also, I am not sure how I can provide the pre-req by clicking on one single point on the object. Can someone please walk me through?
Is there a way to make FoundationPose work with only the 2D bounding box of he object of interest?
Yes, you can convert the bbox to a segmentation mask and run the same way. It will work fine. To convert, make the pixels inside the box >0 and background==0.
Thanks for your response. Could you please clarify this or please link me to a reference? Any chance you may be able to provide an example of this?
Yes, you can convert the bbox to a segmentation mask and run the same way. It will work fine. To convert, make the pixels inside the box >0 and background==0.
Do you expect the performance to drop if I use 2D bbox instead of segmentation mask?
Suppose your bbox is [umin, vmin, umax, vmax]
mask = np.zeros((height, width), dtype=bool)
mask[vmin:vmax, umin:umax] = 1
no, it should work as good as the segmentation. I've tried this many times.
@wenbowen123 Thanks a lot for your guidance. I just wanted to confirm I was able to perform FoundationPose with only 2D bbox of first frame in yolox format and converting it to binary mask.
yes
@wenbowen123 In this case, the generated mask will be completely white. Will it still work?
@wenbowen123 In this case, the generated mask will be completely white. Will it still work?
the area inside the 2D box will be all white, yes, this will be fine.