Segment-Everything-Everywhere-All-At-Once icon indicating copy to clipboard operation
Segment-Everything-Everywhere-All-At-Once copied to clipboard

During the evaluation phase, a warning error occurred stating that only support batch size equal to 1

Open EricZavier opened this issue 1 year ago • 8 comments

QQ图片20231212111948 During the evaluation phase, a warning error occurred stating that only support batch size equal to 1. Here is the command I used: CUDA_VISIBLE_DEVICES=0,1,2,3 mpirun -n 4 python entry.py train \
--conf_files ./configs/seem/samvitb_unicl_lang_v1.yaml
--overrides
FP16 True
COCO.INPUT.IMAGE_SIZE 1024
MODEL.DECODER.HIDDEN_DIM 512
MODEL.ENCODER.CONVS_DIM 512
MODEL.ENCODER.MASK_DIM 512
TEST.BATCH_SIZE_TOTAL 4
TRAIN.BATCH_SIZE_TOTAL 16
TRAIN.BATCH_SIZE_PER_GPU 4
SOLVER.MAX_NUM_EPOCHS 1
SOLVER.BASE_LR 0.0001
SOLVER.FIX_PARAM.backbone True
SOLVER.FIX_PARAM.lang_encoder True
SOLVER.FIX_PARAM.pixel_decoder True
MODEL.DECODER.COST_SPATIAL.CLASS_WEIGHT 5.0
MODEL.DECODER.COST_SPATIAL.MASK_WEIGHT 2.0
MODEL.DECODER.COST_SPATIAL.DICE_WEIGHT 2.0
MODEL.DECODER.TOP_SPATIAL_LAYERS 10
MODEL.DECODER.SPATIAL.ENABLED True
MODEL.DECODER.GROUNDING.ENABLED True
FIND_UNUSED_PARAMETERS True
ATTENTION_ARCH.SPATIAL_MEMORIES 32
MODEL.DECODER.SPATIAL.MAX_ITER 5
ATTENTION_ARCH.QUERY_NUMBER 3
STROKE_SAMPLER.MAX_CANDIDATE 10
MODEL.BACKBONE.PRETRAINED ./xdecoder_data/pretrained/sam_vit_b_01ec64.pth
WEIGHT True
RESUME_FROM ./xdecoder_data/pretrained/focalb_lang_unicl.pt

EricZavier avatar Dec 12 '23 03:12 EricZavier

I got the same question.

CrazyLenmon avatar Dec 17 '23 15:12 CrazyLenmon

I got the same question.

Friend, have you resolved the issue? I feel like the downloaded data for the validation set might not correspond to the correct version.

EricZavier avatar Dec 23 '23 09:12 EricZavier

I got the same question.

Friend, have you resolved the issue? I feel like the downloaded data for the validation set might not correspond to the correct version.

Yeah, I just set all the batchsize to 1 and it works. Probably because it use only 1 GPU during the eval phase. Following the dataset.md, I didn't have any problem in preparing data.

CrazyLenmon avatar Dec 23 '23 14:12 CrazyLenmon

I got the same question.

Friend, have you resolved the issue? I feel like the downloaded data for the validation set might not correspond to the correct version.

Yeah, I just set all the batchsize to 1 and it works. Probably because it use only 1 GPU during the eval phase. Following the dataset.md, I didn't have any problem in preparing data.

Thanks your patient reply extremely,Can you give me your train command as a reference, because I am using 4 GPU devices and I also want to switch to training with one GPU like yours,Furthermore, which version of the PascalVOC dataset file, dataset.md, did you choose to download? I am using VOCtrainvalue_ In 2007, my error also appeared in the PascalVOC folder. The len (batched_inputs) of a single PNG image under JPEGImages was 2

EricZavier avatar Dec 24 '23 05:12 EricZavier

Evaluation with 1-gpu is because if we concatenate images in a single batch, e.g. one image with [512, 1024], another image with [1024, 512], the concatenated batch would be [2, 1024, 1024], padding so much zero will largely influence the performance.

MaureenZOU avatar Dec 24 '23 14:12 MaureenZOU

I got the same question.

Friend, have you resolved the issue? I feel like the downloaded data for the validation set might not correspond to the correct version.

Yeah, I just set all the batchsize to 1 and it works. Probably because it use only 1 GPU during the eval phase. Following the dataset.md, I didn't have any problem in preparing data.

Thanks your patient reply extremely,Can you give me your train command as a reference, because I am using 4 GPU devices and I also want to switch to training with one GPU like yours,Furthermore, which version of the PascalVOC dataset file, dataset.md, did you choose to download? I am using VOCtrainvalue_ In 2007, my error also appeared in the PascalVOC folder. The len (batched_inputs) of a single PNG image under JPEGImages was 2

  1. If you want to download pascalVOC, please download the 2012 version from website: http://host.robots.ox.ac.uk/pascal/VOC/
  2. For the VOC len(batch_size) problem, please change the config e.g. https://github.com/UX-Decoder/Segment-Everything-Everywhere-All-At-Once/blob/7b2e76dbb17d0b7831c6813a921fe2bc8de22926/configs/seem/focalt_unicl_lang_v1.yaml#L330, you can add VOC.TEST.BATCH_SIZE_TOTAL 1 in the command

MaureenZOU avatar Dec 24 '23 14:12 MaureenZOU

I got the same question.

Friend, have you resolved the issue? I feel like the downloaded data for the validation set might not correspond to the correct version.

Yeah, I just set all the batchsize to 1 and it works. Probably because it use only 1 GPU during the eval phase. Following the dataset.md, I didn't have any problem in preparing data.

Thanks your patient reply extremely,Can you give me your train command as a reference, because I am using 4 GPU devices and I also want to switch to training with one GPU like yours,Furthermore, which version of the PascalVOC dataset file, dataset.md, did you choose to download? I am using VOCtrainvalue_ In 2007, my error also appeared in the PascalVOC folder. The len (batched_inputs) of a single PNG image under JPEGImages was 2

  1. If you want to download pascalVOC, please download the 2012 version from website: http://host.robots.ox.ac.uk/pascal/VOC/
  2. For the VOC len(batch_size) problem, please change the config e.g. https://github.com/UX-Decoder/Segment-Everything-Everywhere-All-At-Once/blob/7b2e76dbb17d0b7831c6813a921fe2bc8de22926/configs/seem/focalt_unicl_lang_v1.yaml#L330 , you can add VOC.TEST.BATCH_SIZE_TOTAL 1 in the command

Thank you for your patient answer.If I have 4 GPU devices, should I use VOC TEST BATCH_ SIZE_ Total set to 4?

EricZavier avatar Dec 26 '23 01:12 EricZavier

I got the same question.

Friend, have you resolved the issue? I feel like the downloaded data for the validation set might not correspond to the correct version.

Yeah, I just set all the batchsize to 1 and it works. Probably because it use only 1 GPU during the eval phase. Following the dataset.md, I didn't have any problem in preparing data.

Thanks your patient reply extremely,Can you give me your train command as a reference, because I am using 4 GPU devices and I also want to switch to training with one GPU like yours,Furthermore, which version of the PascalVOC dataset file, dataset.md, did you choose to download? I am using VOCtrainvalue_ In 2007, my error also appeared in the PascalVOC folder. The len (batched_inputs) of a single PNG image under JPEGImages was 2

  1. If you want to download pascalVOC, please download the 2012 version from website: http://host.robots.ox.ac.uk/pascal/VOC/

  2. For the VOC len(batch_size) problem, please change the config e.g. https://github.com/UX-Decoder/Segment-Everything-Everywhere-All-At-Once/blob/7b2e76dbb17d0b7831c6813a921fe2bc8de22926/configs/seem/focalt_unicl_lang_v1.yaml#L330

    , you can add VOC.TEST.BATCH_SIZE_TOTAL 1 in the command

Thank you for your patient answer.If I have 4 GPU devices, should I use VOC TEST BATCH_ SIZE_ Total set to 4?

Yes, exactly, that is how many total test size, and on each gpu it would automatically load len(batch) = BATCH_SIZE_TOTAL/NUM_GPUS

MaureenZOU avatar Dec 26 '23 02:12 MaureenZOU