VSS-CFFM
VSS-CFFM copied to clipboard
Dose the test reuslts on several images rather than whole videos represent the performance of video semantic segmentation methods?
Here is a problem I'm confusing. The task of video semantic segmentation is to segment each frame of videos. But only several frames are labeled in the test set, the test performance in experiments is on several images rather than whole videos. I think it can not represent the performance of video semantic segmentation methods. Did I misunderstand something here?
Hi, thanks for your interest. For VSPW dataset, the test performance is on the whole videos, rather than several images. For cityscapes, it is true that the test performance is on images, rather than whole videos. Your concern is reasonable. This is why we conduct most of our experiments on VSPW dataset, rather than cityscapes. Previously, there is no fully annotated dataset for video semantic segmentation, so researchers use cityscapes for experiments.