ViT-Adapter
ViT-Adapter copied to clipboard
The preparation of dataset for segmentation
Hi! I'm trying to train the mask2former beit adapter on a customized dataset and I modified the config file based on mask2former_beit_adapter_large_896_80k_cityscapes_ss.py, during training, there is no bug but the loss does not convergent.
I am confused about these lines:
dict(type='ToMask'), dict(type='Collect', keys=['img', 'gt_semantic_seg', 'gt_masks', 'gt_labels'])
Because my dataset only have raw RGB images and image annotation (png format, pixel values in range of 0-classes), how to get "gt_masks".
Does my dataset sufficient to train the mask2former beit adapter?
Thank you so much for any help.``
Hi, here's my understanding of your confusion:
-
dict(type='ToMask')
is a transformation transforms thegt_semantic_seg
(which is a 2d array range 0~classes) tobinary mask
(classes+1 channels, each channel is a binary mask of the corresponding class). This transformation is necessary for training Mask2former(as it is a mask classification-based method).gt_masks
are provided in the transformation'ToMask'
inViT-Adapter/segmentation/mmseg_custom/datasets/pipelines/formatting.py
. - The convergent problem may have many causes. Unsuitable learning rate, wrong setting of the class number, mistakes in loading gt, etc. You can check if the data are loaded correctly or try to overfit your model with a small set of training data. Those are my guesses.