Segment-and-Track-Anything Testing on video segmentation datasets

Is it possible for this project to be tested on video segmentation datasets (YoutubeVIS)? What are the main difficulties: SAM not only segments instances, but also segments backgrounds? Is the segmentation mask output by SAM without category labels? Each video in the dataset is a sequence of images, and this project cannot input this file format and cannot output the result file (JSON) for metric evaluation?

Apr 24 '23 06:04 fanghaook

Hi, thank you for your suggestion. We will add this feature in future versions. the current WebUI now supports input of Image-Sequence. You can try it out by following the tutorial.

Apr 26 '23 03:04 yamy-cheng

I have tried inputting image sequences, and the segmentation and tracking results are very good, but there are some issues that need to be addressed:

The image sequence needs to be placed in a two-layer folder and zipped for successful reading;
If you follow the image naming method of 0-x.png, the order of the generated mask and video images is incorrect, for example, 10-19.png will be before 2.png. The solution is to name it as 00001-00019.png, that is, the image name has the same number of digits.
The final output is pure mask images. Can we generate a sequence of images with masks on the original images. Because it is difficult to pause and observe the segmentation quality of each frame in GIF. The generated video also has very low pixels.
The final generated mask images should preferably have the same name as the input image sequence.

The question I raised may be too detailed, thank you for the author's understanding.

Apr 27 '23 14:04 fanghaook

Thank you very much for providing such detailed feedback！

May I ask what form do you think would be better as an input format for Image-Sequence? We have previously considered using each frame of the video as a separate file for input, but this format is not visually appealing as some videos may contain a large number of frames.
Thank you for your correction. There is an issue with the image naming method described in the tutorial, and we will make the necessary changes.
& 4. We will improve these issues in the next version.

Thank you once again for your valuable feedback.

Apr 28 '23 02:04 yamy-cheng

I have tried inputting image sequences, and the segmentation and tracking results are very good, but there are some issues that need to be addressed:

The image sequence needs to be placed in a two-layer folder and zipped for successful reading;

If you follow the image naming method of 0-x.png, the order of the generated mask and video images is incorrect, for example, 10-19.png will be before 2.png. The solution is to name it as 00001-00019.png, that is, the image name has the same number of digits.

The final output is pure mask images. Can we generate a sequence of images with masks on the original images. Because it is difficult to pause and observe the segmentation quality of each frame in GIF. The generated video also has very low pixels.

The final generated mask images should preferably have the same name as the input image sequence.

The question I raised may be too detailed, thank you for the author's understanding.

Hi, the requirements for points 2, 3, and 4 have been fulfilled in the latest version. The tracking result will be saved in ./tracking_results/your_video_name dir.

Apr 28 '23 03:04 yamy-cheng

Your efficiency is too high, all the questions I raised have been resolved. This image sequence format is sufficient, as most video segmentation datasets use this image sequence and do not exceed a hundred frames. Thank you for your work!

Apr 28 '23 04:04 fanghaook