pytorch-segmentation-toolbox The program is stuck.

(pytorch-0.41) <phd-1@kbkb541-server pytorch-segmentation-toolbox>$CUDA_VISIBLE_DEVICES=0,1,2,3 sh ./run_local.sh /media/phd-1/syz/OCNet/dataset/cityscapes Linux kbkb541-server 4.15.0-39-generic #42~16.04.1-Ubuntu SMP Wed Oct 24 17:09:54 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux 2018年 12月 04日星期二 17:25:46 CST ResNet( (conv1): Conv2d(3, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn1): InPlaceABNSync(64, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (relu1): ReLU() (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): InPlaceABNSync(64, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (relu2): ReLU() (conv3): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn3): InPlaceABNSync(128, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (relu3): ReLU() (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=True) (relu): ReLU() (layer1): Sequential( (0): Bottleneck( (conv1): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): InPlaceABNSync(64, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): InPlaceABNSync(64, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (relu): ReLU() (relu_inplace): ReLU(inplace) (downsample): Sequential( (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) ) ) (1): Bottleneck( (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): InPlaceABNSync(64, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): InPlaceABNSync(64, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (relu): ReLU() (relu_inplace): ReLU(inplace) ) (2): Bottleneck( (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): InPlaceABNSync(64, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): InPlaceABNSync(64, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (relu): ReLU() (relu_inplace): ReLU(inplace) ) ) (layer2): Sequential( (0): Bottleneck( (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): InPlaceABNSync(128, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn2): InPlaceABNSync(128, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): InPlaceABNSync(512, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (relu): ReLU() (relu_inplace): ReLU(inplace) (downsample): Sequential( (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): InPlaceABNSync(512, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) ) ) (1): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): InPlaceABNSync(128, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): InPlaceABNSync(128, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): InPlaceABNSync(512, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (relu): ReLU() (relu_inplace): ReLU(inplace) ) (2): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): InPlaceABNSync(128, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): InPlaceABNSync(128, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): InPlaceABNSync(512, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (relu): ReLU() (relu_inplace): ReLU(inplace) ) (3): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): InPlaceABNSync(128, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): InPlaceABNSync(128, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): InPlaceABNSync(512, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (relu): ReLU() (relu_inplace): ReLU(inplace) ) ) (layer3): Sequential( (0): Bottleneck( (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False) (bn2): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): InPlaceABNSync(1024, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (relu): ReLU() (relu_inplace): ReLU(inplace) (downsample): Sequential( (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): InPlaceABNSync(1024, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) ) ) (1): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False) (bn2): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): InPlaceABNSync(1024, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (relu): ReLU() (relu_inplace): ReLU(inplace) ) (2): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False) (bn2): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): InPlaceABNSync(1024, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (relu): ReLU() (relu_inplace): ReLU(inplace) ) (3): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False) (bn2): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): InPlaceABNSync(1024, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (relu): ReLU() (relu_inplace): ReLU(inplace) ) (4): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False) (bn2): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): InPlaceABNSync(1024, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (relu): ReLU() (relu_inplace): ReLU(inplace) ) (5): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False) (bn2): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): InPlaceABNSync(1024, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (relu): ReLU() (relu_inplace): ReLU(inplace) ) (6): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False) (bn2): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): InPlaceABNSync(1024, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (relu): ReLU() (relu_inplace): ReLU(inplace) ) (7): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False) (bn2): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): InPlaceABNSync(1024, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (relu): ReLU() (relu_inplace): ReLU(inplace) ) (8): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False) (bn2): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): InPlaceABNSync(1024, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (relu): ReLU() (relu_inplace): ReLU(inplace) ) (9): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False) (bn2): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): InPlaceABNSync(1024, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (relu): ReLU() (relu_inplace): ReLU(inplace) ) (10): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False) (bn2): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): InPlaceABNSync(1024, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (relu): ReLU() (relu_inplace): ReLU(inplace) ) (11): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False) (bn2): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): InPlaceABNSync(1024, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (relu): ReLU() (relu_inplace): ReLU(inplace) ) (12): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False) (bn2): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): InPlaceABNSync(1024, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (relu): ReLU() (relu_inplace): ReLU(inplace) ) (13): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False) (bn2): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): InPlaceABNSync(1024, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (relu): ReLU() (relu_inplace): ReLU(inplace) ) (14): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False) (bn2): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): InPlaceABNSync(1024, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (relu): ReLU() (relu_inplace): ReLU(inplace) ) (15): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False) (bn2): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): InPlaceABNSync(1024, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (relu): ReLU() (relu_inplace): ReLU(inplace) ) (16): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False) (bn2): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): InPlaceABNSync(1024, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (relu): ReLU() (relu_inplace): ReLU(inplace) ) (17): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False) (bn2): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): InPlaceABNSync(1024, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (relu): ReLU() (relu_inplace): ReLU(inplace) ) (18): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False) (bn2): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): InPlaceABNSync(1024, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (relu): ReLU() (relu_inplace): ReLU(inplace) ) (19): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False) (bn2): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): InPlaceABNSync(1024, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (relu): ReLU() (relu_inplace): ReLU(inplace) ) (20): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False) (bn2): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): InPlaceABNSync(1024, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (relu): ReLU() (relu_inplace): ReLU(inplace) ) (21): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False) (bn2): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): InPlaceABNSync(1024, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (relu): ReLU() (relu_inplace): ReLU(inplace) ) (22): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2), bias=False) (bn2): InPlaceABNSync(256, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): InPlaceABNSync(1024, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (relu): ReLU() (relu_inplace): ReLU(inplace) ) ) (layer4): Sequential( (0): Bottleneck( (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): InPlaceABNSync(512, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(4, 4), dilation=(4, 4), bias=False) (bn2): InPlaceABNSync(512, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): InPlaceABNSync(2048, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (relu): ReLU() (relu_inplace): ReLU(inplace) (downsample): Sequential( (0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): InPlaceABNSync(2048, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) ) ) (1): Bottleneck( (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): InPlaceABNSync(512, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(4, 4), dilation=(4, 4), bias=False) (bn2): InPlaceABNSync(512, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): InPlaceABNSync(2048, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (relu): ReLU() (relu_inplace): ReLU(inplace) ) (2): Bottleneck( (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): InPlaceABNSync(512, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(4, 4), dilation=(4, 4), bias=False) (bn2): InPlaceABNSync(512, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): InPlaceABNSync(2048, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=none) (relu): ReLU() (relu_inplace): ReLU(inplace) ) ) (head): Sequential( (0): PSPModule( (stages): ModuleList( (0): Sequential( (0): AdaptiveAvgPool2d(output_size=(1, 1)) (1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (2): InPlaceABNSync(512, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=leaky_relu slope=0.01) ) (1): Sequential( (0): AdaptiveAvgPool2d(output_size=(2, 2)) (1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (2): InPlaceABNSync(512, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=leaky_relu slope=0.01) ) (2): Sequential( (0): AdaptiveAvgPool2d(output_size=(3, 3)) (1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (2): InPlaceABNSync(512, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=leaky_relu slope=0.01) ) (3): Sequential( (0): AdaptiveAvgPool2d(output_size=(6, 6)) (1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (2): InPlaceABNSync(512, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=leaky_relu slope=0.01) ) ) (bottleneck): Sequential( (0): Conv2d(4096, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (1): InPlaceABNSync(512, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=leaky_relu slope=0.01) (2): Dropout2d(p=0.1) ) ) (1): Conv2d(512, 19, kernel_size=(1, 1), stride=(1, 1)) ) (dsn): Sequential( (0): Conv2d(1024, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): InPlaceABNSync(512, eps=1e-05, momentum=0.1, affine=True, devices=[0, 1, 2, 3], activation=leaky_relu slope=0.01) (2): Dropout2d(p=0.1) (3): Conv2d(512, 19, kernel_size=(1, 1), stride=(1, 1)) ) ) /home/phd-1/.conda/envs/pytorch-0.41/lib/python3.6/site-packages/torch/nn/functional.py:52: UserWarning: size_average and reduce args will be deprecated, please use reduction='elementwise_mean' instead. warnings.warn(warning.format(ret)) 321300 images are loaded!

it don't continue, why? I think may be because of the InPlaceABNSync. how I can slove it?

Dec 04 '18 09:12 suyanzhou626

Hi, @suyanzhou626 I cannot find the problem from your information. Please make sure your data loader can access the images and labels firstly.

Dec 05 '18 17:12 speedinghzl

@suyanzhou626 Hi, I met the same problem with you, have you solved it ?

Dec 09 '18 05:12 lxtGH

Same problem here but my program was stuck after the printing of iteration 1. No error printing. Running on 4X 12g GPUs (three XP, one V) with batch 4X2=8.

If I reduce the batch to 4X1=4 with default crop size 769, or default batch 4X2=8 with smaller crop size 761, the program goes well.

So it seems the memory problem but there is no out of memory error and just only stuck there. @speedinghzl Any thought? Thanks.

Jan 07 '19 21:01 lzrobots

@lzrobots Yes, it is caused by out of memory. Is it TITAN V in the first position in your server? If so, you can change the order of GPU ids (e.g. 1,0,2,3) to solve this problem. Or you can run this repo with 761 and it does not affect the final result.

Jan 07 '19 21:01 speedinghzl

Yes solved. Thanks!

Jan 07 '19 22:01 lzrobots

I am facing the same problem as @lzrobots. Tried BS=8 with INPUT_SIZE=[769, 769] or [761, 761], the simulation is stuck after iteration1. I've 4x12G 1080 Ti GPUs. With a smaller BS, say BS=4, the program runs well, but I'm afraid that it may affect the final performance. Any suggestions here @speedinghzl ?

[EDIT] Stuck with even lower input size, [713, 713]. Only lowering BS seems to help. Any workaround please?

Jan 12 '19 07:01 aasharma90

1080Ti only has 11G memory, you can try to lower batch size. But I think it will affect the performance (~77% rather than ~78%).

Jan 12 '19 07:01 speedinghzl

Hi @speedinghzl , thanks for your swift response. Yes, the available memory is around 11G only. I think I can manage with that much performance difference, so I will proceed with a lower BS.

Thanks for your help!

Jan 12 '19 07:01 aasharma90

Hi @speedinghzl ,

Just for your information, changing BS=4 while keeping everything as it was, I got a MIU of ~75.8%.

Jan 14 '19 04:01 aasharma90

When you set BS=4, you should increase the iterations from 40K to 80K. Then you can increase the input size to take ~11G memory.

Jan 14 '19 04:01 speedinghzl

Hi @speedinghzl ,

Your suggestions make sense. I'll try them out now and update you with the outcome. Thanks for your help!

Jan 14 '19 04:01 aasharma90

@speedinghzl Sorry, I have the same issue, my program is even stuck in 4xTitan Xp with the default settings.

Jan 15 '19 03:01 d-li14

Hi @speedinghzl, I get a score of 76.22% changing STEPS to 80k from 40k (keeping BS=4, and everything else as it is) I did not try with a larger input size, but it also seems an options worth trying since there is still some memory left that can be used. Thanks for your help!

Hi @d-li14 , Could you try running with BS=4 and STEPS=80k, and see if they solve your problem? You can see the reported performance numbers for your reference.

Jan 15 '19 05:01 aasharma90

@aasharma90 Thanks for your kind advice! Shrinking the batch size can definitely fit the model into GPU memory with ease, but we have to sacrifice the performance as demonstrated in your experiments (even unable to reproduce the original DeepLab result, significantly lower than the reported 78.9%).

I am curious that as stated by the author, 4x12g VRAM will be enough to run the script successfully, but in my case, it seems not to work. So any helpful advice? @speedinghzl

Jan 15 '19 08:01 d-li14

Actually, 4x12g VRAM is not enough. I have run the run_local.sh with 4 Tesla M40 GPUs, the memory usage of GPU 0 is over 12800M. Without modifying any default settings of this script, I got the final mIoU 77.4%.

Jan 17 '19 10:01 d-li14

pytorch-segmentation-toolbox pytorch-segmentation-toolbox copied to clipboard

The program is stuck.

pytorch-segmentation-toolbox
pytorch-segmentation-toolbox copied to clipboard