ViT-Adapter icon indicating copy to clipboard operation
ViT-Adapter copied to clipboard

The preparation of dataset for segmentation

Open YYYYYixin opened this issue 1 year ago • 1 comments

Hi! I'm trying to train the mask2former beit adapter on a customized dataset and I modified the config file based on mask2former_beit_adapter_large_896_80k_cityscapes_ss.py, during training, there is no bug but the loss does not convergent.

I am confused about these lines: dict(type='ToMask'), dict(type='Collect', keys=['img', 'gt_semantic_seg', 'gt_masks', 'gt_labels']) Because my dataset only have raw RGB images and image annotation (png format, pixel values in range of 0-classes), how to get "gt_masks". Does my dataset sufficient to train the mask2former beit adapter? Thank you so much for any help.``

YYYYYixin avatar May 13 '23 16:05 YYYYYixin

Hi, here's my understanding of your confusion:

  1. dict(type='ToMask') is a transformation transforms the gt_semantic_seg(which is a 2d array range 0~classes) to binary mask(classes+1 channels, each channel is a binary mask of the corresponding class). This transformation is necessary for training Mask2former(as it is a mask classification-based method). gt_masks are provided in the transformation 'ToMask' in ViT-Adapter/segmentation/mmseg_custom/datasets/pipelines/formatting.py.
  2. The convergent problem may have many causes. Unsuitable learning rate, wrong setting of the class number, mistakes in loading gt, etc. You can check if the data are loaded correctly or try to overfit your model with a small set of training data. Those are my guesses.

duanduanduanyuchen avatar May 16 '23 09:05 duanduanduanyuchen