SparseInst
SparseInst copied to clipboard
ValueError: matrix contains invalid numeric entries for binary classification
Hi! I would like to train sparse_inst_r50_giam_fp16
for binary classification. I registered my train and test datasets and started training with python3.9 tools/train_net.py --config-file configs/sparse_inst_r50_giam_fp16.yaml --num-gpus 1 SOLVER.AMP.ENABLED True
command. As far as I understand, there is no need to change sizes of images in my datasets (they are all 2048x2448).
However, I got ValueError: matrix contains invalid numeric entries
on the 97th iteration.
Here is my environment and full logs:
[09/29 01:09:23] detectron2 INFO: Rank of current process: 0. World size: 1
[09/29 01:09:31] detectron2 INFO: Environment info:
---------------------- ------------------------------------------------------------------------------------------
sys.platform linux
Python 3.9.10 (main, Jan 15 2022, 18:56:52) [GCC 7.5.0]
numpy 1.23.3
detectron2 0.6 @/raid/kirill/test/venv/lib/python3.9/site-packages/detectron2
Compiler GCC 7.3
CUDA compiler CUDA 11.1
detectron2 arch flags 3.7, 5.0, 5.2, 6.0, 6.1, 7.0, 7.5, 8.0, 8.6
DETECTRON2_ENV_MODULE <not set>
PyTorch 1.10.0+cu111 @/raid/kirill/test/venv/lib/python3.9/site-packages/torch
PyTorch debug build False
GPU available Yes
GPU 0 Tesla V100-SXM3-32GB (arch=7.0)
Driver version 450.142.00
CUDA_HOME /usr/local/cuda
Pillow 9.2.0
torchvision 0.11.0+cu111 @/raid/kirill/test/venv/lib/python3.9/site-packages/torchvision
torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6
fvcore 0.1.5.post20220512
iopath 0.1.9
cv2 4.6.0
---------------------- ------------------------------------------------------------------------------------------
PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX512
- CUDA Runtime 11.1
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
- CuDNN 8.0.5
- Magma 2.5.2
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,
[09/29 01:09:31] detectron2 INFO: Command line arguments: Namespace(config_file='configs/sparse_inst_r50_giam_fp16.yaml', resume=False, eval_only=False, num_gpus=1, num_machines=1, machine_rank=0, dist_url='tcp://127.0.0.1:50153', opts=['SOLVER.AMP.ENABLED', 'True'])
[09/29 01:09:31] detectron2 INFO: Contents of args.config_file=configs/sparse_inst_r50_giam_fp16.yaml:
_BASE_: "Base-SparseInst.yaml"
MODEL:
WEIGHTS: "pretrained_models/R-50.pkl"
SOLVER:
AMP:
ENABLED: True
OUTPUT_DIR: "output/sparse_inst_r50_giam_fp16"
[09/29 01:09:31] detectron2 INFO: Running with full config:
CUDNN_BENCHMARK: false
DATALOADER:
ASPECT_RATIO_GROUPING: true
FILTER_EMPTY_ANNOTATIONS: true
NUM_WORKERS: 6
REPEAT_THRESHOLD: 0.0
SAMPLER_TRAIN: TrainingSampler
DATASETS:
PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000
PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000
PROPOSAL_FILES_TEST: []
PROPOSAL_FILES_TRAIN: []
TEST:
- maf_val
TRAIN:
- maf_train
GLOBAL:
HACK: 1.0
INPUT:
CROP:
ENABLED: false
SIZE:
- 0.9
- 0.9
TYPE: relative_range
FORMAT: RGB
MASK_FORMAT: bitmask
MAX_SIZE_TEST: 853
MAX_SIZE_TRAIN: 853
MIN_SIZE_TEST: 640
MIN_SIZE_TRAIN:
- 416
- 448
- 480
- 512
- 544
- 576
- 608
- 640
MIN_SIZE_TRAIN_SAMPLING: choice
RANDOM_FLIP: horizontal
MODEL:
ANCHOR_GENERATOR:
ANGLES:
- - -90
- 0
- 90
ASPECT_RATIOS:
- - 0.5
- 1.0
- 2.0
NAME: DefaultAnchorGenerator
OFFSET: 0.0
SIZES:
- - 32
- 64
- 128
- 256
- 512
BACKBONE:
FREEZE_AT: 0
NAME: build_resnet_backbone
CSPNET:
NAME: darknet53
NORM: ''
OUT_FEATURES:
- csp1
- csp2
- csp3
- csp4
DEVICE: cuda
FPN:
FUSE_TYPE: sum
IN_FEATURES: []
NORM: ''
OUT_CHANNELS: 256
KEYPOINT_ON: false
LOAD_PROPOSALS: false
MASK_ON: true
META_ARCHITECTURE: SparseInst
PANOPTIC_FPN:
COMBINE:
ENABLED: true
INSTANCES_CONFIDENCE_THRESH: 0.5
OVERLAP_THRESH: 0.5
STUFF_AREA_LIMIT: 4096
INSTANCE_LOSS_WEIGHT: 1.0
PIXEL_MEAN:
- 123.675
- 116.28
- 103.53
PIXEL_STD:
- 58.395
- 57.12
- 57.375
PROPOSAL_GENERATOR:
MIN_SIZE: 0
NAME: RPN
PVT:
LINEAR: false
NAME: b1
OUT_FEATURES:
- p2
- p3
- p4
RESNETS:
DEFORM_MODULATED: false
DEFORM_NUM_GROUPS: 1
DEFORM_ON_PER_STAGE:
- false
- false
- false
- false
DEPTH: 50
NORM: FrozenBN
NUM_GROUPS: 1
OUT_FEATURES:
- res3
- res4
- res5
RES2_OUT_CHANNELS: 256
RES5_DILATION: 1
STEM_OUT_CHANNELS: 64
STRIDE_IN_1X1: false
WIDTH_PER_GROUP: 64
RETINANET:
BBOX_REG_LOSS_TYPE: smooth_l1
BBOX_REG_WEIGHTS: &id002
- 1.0
- 1.0
- 1.0
- 1.0
FOCAL_LOSS_ALPHA: 0.25
FOCAL_LOSS_GAMMA: 2.0
IN_FEATURES:
- p3
- p4
- p5
- p6
- p7
IOU_LABELS:
- 0
- -1
- 1
IOU_THRESHOLDS:
- 0.4
- 0.5
NMS_THRESH_TEST: 0.5
NORM: ''
NUM_CLASSES: 80
NUM_CONVS: 4
PRIOR_PROB: 0.01
SCORE_THRESH_TEST: 0.05
SMOOTH_L1_LOSS_BETA: 0.1
TOPK_CANDIDATES_TEST: 1000
ROI_BOX_CASCADE_HEAD:
BBOX_REG_WEIGHTS:
- &id001
- 10.0
- 10.0
- 5.0
- 5.0
- - 20.0
- 20.0
- 10.0
- 10.0
- - 30.0
- 30.0
- 15.0
- 15.0
IOUS:
- 0.5
- 0.6
- 0.7
ROI_BOX_HEAD:
BBOX_REG_LOSS_TYPE: smooth_l1
BBOX_REG_LOSS_WEIGHT: 1.0
BBOX_REG_WEIGHTS: *id001
CLS_AGNOSTIC_BBOX_REG: false
CONV_DIM: 256
FC_DIM: 1024
NAME: ''
NORM: ''
NUM_CONV: 0
NUM_FC: 0
POOLER_RESOLUTION: 14
POOLER_SAMPLING_RATIO: 0
POOLER_TYPE: ROIAlignV2
SMOOTH_L1_BETA: 0.0
TRAIN_ON_PRED_BOXES: false
ROI_HEADS:
BATCH_SIZE_PER_IMAGE: 512
IN_FEATURES:
- res4
IOU_LABELS:
- 0
- 1
IOU_THRESHOLDS:
- 0.5
NAME: Res5ROIHeads
NMS_THRESH_TEST: 0.5
NUM_CLASSES: 80
POSITIVE_FRACTION: 0.25
PROPOSAL_APPEND_GT: true
SCORE_THRESH_TEST: 0.05
ROI_KEYPOINT_HEAD:
CONV_DIMS:
- 512
- 512
- 512
- 512
- 512
- 512
- 512
- 512
LOSS_WEIGHT: 1.0
MIN_KEYPOINTS_PER_IMAGE: 1
NAME: KRCNNConvDeconvUpsampleHead
NORMALIZE_LOSS_BY_VISIBLE_KEYPOINTS: true
NUM_KEYPOINTS: 17
POOLER_RESOLUTION: 14
POOLER_SAMPLING_RATIO: 0
POOLER_TYPE: ROIAlignV2
ROI_MASK_HEAD:
CLS_AGNOSTIC_MASK: false
CONV_DIM: 256
NAME: MaskRCNNConvUpsampleHead
NORM: ''
NUM_CONV: 0
POOLER_RESOLUTION: 14
POOLER_SAMPLING_RATIO: 0
POOLER_TYPE: ROIAlignV2
RPN:
BATCH_SIZE_PER_IMAGE: 256
BBOX_REG_LOSS_TYPE: smooth_l1
BBOX_REG_LOSS_WEIGHT: 1.0
BBOX_REG_WEIGHTS: *id002
BOUNDARY_THRESH: -1
CONV_DIMS:
- -1
HEAD_NAME: StandardRPNHead
IN_FEATURES:
- res4
IOU_LABELS:
- 0
- -1
- 1
IOU_THRESHOLDS:
- 0.3
- 0.7
LOSS_WEIGHT: 1.0
NMS_THRESH: 0.7
POSITIVE_FRACTION: 0.5
POST_NMS_TOPK_TEST: 1000
POST_NMS_TOPK_TRAIN: 2000
PRE_NMS_TOPK_TEST: 6000
PRE_NMS_TOPK_TRAIN: 12000
SMOOTH_L1_BETA: 0.0
SEM_SEG_HEAD:
COMMON_STRIDE: 4
CONVS_DIM: 128
IGNORE_VALUE: 255
IN_FEATURES:
- p2
- p3
- p4
- p5
LOSS_WEIGHT: 1.0
NAME: SemSegFPNHead
NORM: GN
NUM_CLASSES: 54
SPARSE_INST:
CLS_THRESHOLD: 0.005
DATASET_MAPPER: SparseInstDatasetMapper
DECODER:
GROUPS: 4
INST:
CONVS: 4
DIM: 256
KERNEL_DIM: 128
MASK:
CONVS: 4
DIM: 256
NAME: GroupIAMDecoder
NUM_CLASSES: 2
NUM_MASKS: 100
OUTPUT_IAM: false
SCALE_FACTOR: 2.0
ENCODER:
IN_FEATURES:
- res3
- res4
- res5
NAME: InstanceContextEncoder
NORM: ''
NUM_CHANNELS: 256
LOSS:
CLASS_WEIGHT: 2.0
ITEMS:
- labels
- masks
MASK_DICE_WEIGHT: 2.0
MASK_PIXEL_WEIGHT: 5.0
NAME: SparseInstCriterion
OBJECTNESS_WEIGHT: 1.0
MASK_THRESHOLD: 0.45
MATCHER:
ALPHA: 0.8
BETA: 0.2
NAME: SparseInstMatcher
MAX_DETECTIONS: 100
WEIGHTS: pretrained_models/R-50.pkl
OUTPUT_DIR: output/sparse_inst_r50_giam_fp16
SEED: -1
SOLVER:
AMP:
ENABLED: true
AMSGRAD: false
BACKBONE_MULTIPLIER: 1.0
BASE_LR: 5.0e-05
BIAS_LR_FACTOR: 1.0
CHECKPOINT_PERIOD: 5000
CLIP_GRADIENTS:
CLIP_TYPE: value
CLIP_VALUE: 1.0
ENABLED: false
NORM_TYPE: 2.0
GAMMA: 0.1
IMS_PER_BATCH: 8
LR_SCHEDULER_NAME: WarmupMultiStepLR
MAX_ITER: 170000
MOMENTUM: 0.9
NESTEROV: false
OPTIMIZER: ADAMW
REFERENCE_WORLD_SIZE: 0
STEPS:
- 210000
- 250000
WARMUP_FACTOR: 0.001
WARMUP_ITERS: 1000
WARMUP_METHOD: linear
WEIGHT_DECAY: 0.05
WEIGHT_DECAY_BIAS: null
WEIGHT_DECAY_NORM: 0.0
TEST:
AUG:
ENABLED: false
FLIP: true
MAX_SIZE: 4000
MIN_SIZES:
- 400
- 500
- 600
- 700
- 800
- 900
- 1000
- 1100
- 1200
DETECTIONS_PER_IMAGE: 100
EVAL_PERIOD: 7330
EXPECTED_RESULTS: []
KEYPOINT_OKS_SIGMAS: []
PRECISE_BN:
ENABLED: false
NUM_ITER: 200
VERSION: 2
VIS_PERIOD: 0
[09/29 01:09:31] detectron2 INFO: Full config saved to output/sparse_inst_r50_giam_fp16/config.yaml
[09/29 01:09:31] d2.utils.env INFO: Using a generated random seed 31206096
[09/29 01:09:39] d2.engine.defaults INFO: Model:
SparseInst(
(backbone): ResNet(
(stem): BasicStem(
(conv1): Conv2d(
3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
)
(res2): Sequential(
(0): BottleneckBlock(
(shortcut): Conv2d(
64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv1): Conv2d(
64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
(conv2): Conv2d(
64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
(conv3): Conv2d(
64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
)
(1): BottleneckBlock(
(conv1): Conv2d(
256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
(conv2): Conv2d(
64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
(conv3): Conv2d(
64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
)
(2): BottleneckBlock(
(conv1): Conv2d(
256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
(conv2): Conv2d(
64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
(conv3): Conv2d(
64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
)
)
(res3): Sequential(
(0): BottleneckBlock(
(shortcut): Conv2d(
256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv1): Conv2d(
256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv2): Conv2d(
128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv3): Conv2d(
128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
)
(1): BottleneckBlock(
(conv1): Conv2d(
512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv2): Conv2d(
128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv3): Conv2d(
128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
)
(2): BottleneckBlock(
(conv1): Conv2d(
512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv2): Conv2d(
128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv3): Conv2d(
128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
)
(3): BottleneckBlock(
(conv1): Conv2d(
512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv2): Conv2d(
128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv3): Conv2d(
128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
)
)
(res4): Sequential(
(0): BottleneckBlock(
(shortcut): Conv2d(
512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
(conv1): Conv2d(
512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
(1): BottleneckBlock(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
(2): BottleneckBlock(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
(3): BottleneckBlock(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
(4): BottleneckBlock(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
(5): BottleneckBlock(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
)
(res5): Sequential(
(0): BottleneckBlock(
(shortcut): Conv2d(
1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv1): Conv2d(
1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv2): Conv2d(
512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv3): Conv2d(
512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(1): BottleneckBlock(
(conv1): Conv2d(
2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv2): Conv2d(
512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv3): Conv2d(
512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(2): BottleneckBlock(
(conv1): Conv2d(
2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv2): Conv2d(
512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv3): Conv2d(
512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
)
)
(encoder): InstanceContextEncoder(
(fpn_laterals): ModuleList(
(0): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
(1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
(2): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
)
(fpn_outputs): ModuleList(
(0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
(ppm): PyramidPoolingModule(
(stages): ModuleList(
(0): Sequential(
(0): AdaptiveAvgPool2d(output_size=(1, 1))
(1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1))
)
(1): Sequential(
(0): AdaptiveAvgPool2d(output_size=(2, 2))
(1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1))
)
(2): Sequential(
(0): AdaptiveAvgPool2d(output_size=(3, 3))
(1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1))
)
(3): Sequential(
(0): AdaptiveAvgPool2d(output_size=(6, 6))
(1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1))
)
)
(bottleneck): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
)
(fusion): Conv2d(768, 256, kernel_size=(1, 1), stride=(1, 1))
)
(decoder): GroupIAMDecoder(
(inst_branch): GroupInstanceBranch(
(inst_convs): Sequential(
(0): Conv2d(258, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
(2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU(inplace=True)
(4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(5): ReLU(inplace=True)
(6): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(7): ReLU(inplace=True)
)
(iam_conv): Conv2d(256, 400, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
(fc): Linear(in_features=1024, out_features=1024, bias=True)
(cls_score): Linear(in_features=1024, out_features=2, bias=True)
(mask_kernel): Linear(in_features=1024, out_features=128, bias=True)
(objectness): Linear(in_features=1024, out_features=1, bias=True)
)
(mask_branch): MaskBranch(
(mask_convs): Sequential(
(0): Conv2d(258, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
(2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU(inplace=True)
(4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(5): ReLU(inplace=True)
(6): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(7): ReLU(inplace=True)
)
(projection): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1))
)
)
(criterion): SparseInstCriterion(
(matcher): SparseInstMatcher()
)
)
[09/29 01:09:39] sparseinst.dataset_mapper INFO: [DatasetMapper] Augmentations used in training: [RandomFlip(), ResizeShortestEdge(short_edge_length=(416, 448, 480, 512, 544, 576, 608, 640), max_size=853, sample_style='choice')]
[09/29 01:09:39] d2.data.datasets.coco INFO: Loaded 1997 images in COCO format from /raid/kirill/test/data/maf_final/train.json
[09/29 01:09:39] d2.data.build INFO: Removed 3 images with no usable annotations. 1994 images left.
[09/29 01:09:39] d2.data.build INFO: Distribution of instances among all 2 categories:
[36m| category | #instances | category | #instances |
|:-----------:|:-------------|:----------:|:-------------|
| colonna_box | 8170 | sphere_box | 7577 |
| | | | |
| total | 15747 | | |[0m
[09/29 01:09:39] d2.data.build INFO: Using training sampler TrainingSampler
[09/29 01:09:39] d2.data.common INFO: Serializing 1994 elements to byte tensors and concatenating them all ...
[09/29 01:09:39] d2.data.common INFO: Serialized dataset takes 7.23 MiB
[09/29 01:09:39] d2.solver.build WARNING: SOLVER.STEPS contains values larger than SOLVER.MAX_ITER. These values will be ignored.
[09/29 01:09:39] fvcore.common.checkpoint INFO: [Checkpointer] Loading from pretrained_models/R-50.pkl ...
[09/29 01:09:39] fvcore.common.checkpoint INFO: Reading a file from 'torchvision'
[09/29 01:09:39] d2.checkpoint.c2_model_loading INFO: Following weights matched with submodule backbone:
| Names in Model | Names in Checkpoint | Shapes |
|:------------------|:----------------------------------------------------------------------------------|:------------------------------------------------|
| res2.0.conv1.* | res2.0.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (64,) (64,) (64,) (64,) (64,64,1,1) |
| res2.0.conv2.* | res2.0.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (64,) (64,) (64,) (64,) (64,64,3,3) |
| res2.0.conv3.* | res2.0.conv3.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (256,) (256,) (256,) (256,) (256,64,1,1) |
| res2.0.shortcut.* | res2.0.shortcut.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (256,) (256,) (256,) (256,) (256,64,1,1) |
| res2.1.conv1.* | res2.1.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (64,) (64,) (64,) (64,) (64,256,1,1) |
| res2.1.conv2.* | res2.1.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (64,) (64,) (64,) (64,) (64,64,3,3) |
| res2.1.conv3.* | res2.1.conv3.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (256,) (256,) (256,) (256,) (256,64,1,1) |
| res2.2.conv1.* | res2.2.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (64,) (64,) (64,) (64,) (64,256,1,1) |
| res2.2.conv2.* | res2.2.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (64,) (64,) (64,) (64,) (64,64,3,3) |
| res2.2.conv3.* | res2.2.conv3.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (256,) (256,) (256,) (256,) (256,64,1,1) |
| res3.0.conv1.* | res3.0.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (128,) (128,) (128,) (128,) (128,256,1,1) |
| res3.0.conv2.* | res3.0.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (128,) (128,) (128,) (128,) (128,128,3,3) |
| res3.0.conv3.* | res3.0.conv3.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (512,) (512,) (512,) (512,) (512,128,1,1) |
| res3.0.shortcut.* | res3.0.shortcut.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (512,) (512,) (512,) (512,) (512,256,1,1) |
| res3.1.conv1.* | res3.1.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (128,) (128,) (128,) (128,) (128,512,1,1) |
| res3.1.conv2.* | res3.1.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (128,) (128,) (128,) (128,) (128,128,3,3) |
| res3.1.conv3.* | res3.1.conv3.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (512,) (512,) (512,) (512,) (512,128,1,1) |
| res3.2.conv1.* | res3.2.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (128,) (128,) (128,) (128,) (128,512,1,1) |
| res3.2.conv2.* | res3.2.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (128,) (128,) (128,) (128,) (128,128,3,3) |
| res3.2.conv3.* | res3.2.conv3.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (512,) (512,) (512,) (512,) (512,128,1,1) |
| res3.3.conv1.* | res3.3.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (128,) (128,) (128,) (128,) (128,512,1,1) |
| res3.3.conv2.* | res3.3.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (128,) (128,) (128,) (128,) (128,128,3,3) |
| res3.3.conv3.* | res3.3.conv3.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (512,) (512,) (512,) (512,) (512,128,1,1) |
| res4.0.conv1.* | res4.0.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (256,) (256,) (256,) (256,) (256,512,1,1) |
| res4.0.conv2.* | res4.0.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (256,) (256,) (256,) (256,) (256,256,3,3) |
| res4.0.conv3.* | res4.0.conv3.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (1024,) (1024,) (1024,) (1024,) (1024,256,1,1) |
| res4.0.shortcut.* | res4.0.shortcut.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (1024,) (1024,) (1024,) (1024,) (1024,512,1,1) |
| res4.1.conv1.* | res4.1.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (256,) (256,) (256,) (256,) (256,1024,1,1) |
| res4.1.conv2.* | res4.1.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (256,) (256,) (256,) (256,) (256,256,3,3) |
| res4.1.conv3.* | res4.1.conv3.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (1024,) (1024,) (1024,) (1024,) (1024,256,1,1) |
| res4.2.conv1.* | res4.2.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (256,) (256,) (256,) (256,) (256,1024,1,1) |
| res4.2.conv2.* | res4.2.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (256,) (256,) (256,) (256,) (256,256,3,3) |
| res4.2.conv3.* | res4.2.conv3.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (1024,) (1024,) (1024,) (1024,) (1024,256,1,1) |
| res4.3.conv1.* | res4.3.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (256,) (256,) (256,) (256,) (256,1024,1,1) |
| res4.3.conv2.* | res4.3.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (256,) (256,) (256,) (256,) (256,256,3,3) |
| res4.3.conv3.* | res4.3.conv3.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (1024,) (1024,) (1024,) (1024,) (1024,256,1,1) |
| res4.4.conv1.* | res4.4.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (256,) (256,) (256,) (256,) (256,1024,1,1) |
| res4.4.conv2.* | res4.4.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (256,) (256,) (256,) (256,) (256,256,3,3) |
| res4.4.conv3.* | res4.4.conv3.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (1024,) (1024,) (1024,) (1024,) (1024,256,1,1) |
| res4.5.conv1.* | res4.5.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (256,) (256,) (256,) (256,) (256,1024,1,1) |
| res4.5.conv2.* | res4.5.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (256,) (256,) (256,) (256,) (256,256,3,3) |
| res4.5.conv3.* | res4.5.conv3.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (1024,) (1024,) (1024,) (1024,) (1024,256,1,1) |
| res5.0.conv1.* | res5.0.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (512,) (512,) (512,) (512,) (512,1024,1,1) |
| res5.0.conv2.* | res5.0.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (512,) (512,) (512,) (512,) (512,512,3,3) |
| res5.0.conv3.* | res5.0.conv3.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (2048,) (2048,) (2048,) (2048,) (2048,512,1,1) |
| res5.0.shortcut.* | res5.0.shortcut.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (2048,) (2048,) (2048,) (2048,) (2048,1024,1,1) |
| res5.1.conv1.* | res5.1.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (512,) (512,) (512,) (512,) (512,2048,1,1) |
| res5.1.conv2.* | res5.1.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (512,) (512,) (512,) (512,) (512,512,3,3) |
| res5.1.conv3.* | res5.1.conv3.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (2048,) (2048,) (2048,) (2048,) (2048,512,1,1) |
| res5.2.conv1.* | res5.2.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (512,) (512,) (512,) (512,) (512,2048,1,1) |
| res5.2.conv2.* | res5.2.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (512,) (512,) (512,) (512,) (512,512,3,3) |
| res5.2.conv3.* | res5.2.conv3.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (2048,) (2048,) (2048,) (2048,) (2048,512,1,1) |
| stem.conv1.* | stem.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (64,) (64,) (64,) (64,) (64,3,7,7) |
[09/29 01:09:39] fvcore.common.checkpoint WARNING: Some model parameters or buffers are not found in the checkpoint:
[34mdecoder.inst_branch.cls_score.{bias, weight}[0m
[34mdecoder.inst_branch.fc.{bias, weight}[0m
[34mdecoder.inst_branch.iam_conv.{bias, weight}[0m
[34mdecoder.inst_branch.inst_convs.0.{bias, weight}[0m
[34mdecoder.inst_branch.inst_convs.2.{bias, weight}[0m
[34mdecoder.inst_branch.inst_convs.4.{bias, weight}[0m
[34mdecoder.inst_branch.inst_convs.6.{bias, weight}[0m
[34mdecoder.inst_branch.mask_kernel.{bias, weight}[0m
[34mdecoder.inst_branch.objectness.{bias, weight}[0m
[34mdecoder.mask_branch.mask_convs.0.{bias, weight}[0m
[34mdecoder.mask_branch.mask_convs.2.{bias, weight}[0m
[34mdecoder.mask_branch.mask_convs.4.{bias, weight}[0m
[34mdecoder.mask_branch.mask_convs.6.{bias, weight}[0m
[34mdecoder.mask_branch.projection.{bias, weight}[0m
[34mencoder.fpn_laterals.0.{bias, weight}[0m
[34mencoder.fpn_laterals.1.{bias, weight}[0m
[34mencoder.fpn_laterals.2.{bias, weight}[0m
[34mencoder.fpn_outputs.0.{bias, weight}[0m
[34mencoder.fpn_outputs.1.{bias, weight}[0m
[34mencoder.fpn_outputs.2.{bias, weight}[0m
[34mencoder.fusion.{bias, weight}[0m
[34mencoder.ppm.bottleneck.{bias, weight}[0m
[34mencoder.ppm.stages.0.1.{bias, weight}[0m
[34mencoder.ppm.stages.1.1.{bias, weight}[0m
[34mencoder.ppm.stages.2.1.{bias, weight}[0m
[34mencoder.ppm.stages.3.1.{bias, weight}[0m
[09/29 01:09:39] fvcore.common.checkpoint WARNING: The checkpoint state_dict contains keys that are not used by the model:
[35mstem.fc.{bias, weight}[0m
[09/29 01:09:39] d2.engine.train_loop INFO: Starting training from iteration 0
[09/29 01:09:46] d2.utils.events INFO: eta: 11:40:45 iter: 19 total_loss: 8.062 loss_ce: 2.207 loss_objectness: 0.7034 loss_dice: 1.997 loss_mask: 3.15 time: 0.2915 data_time: 0.1088 lr: 9.9905e-07 max_mem: 3306M
[09/29 01:09:52] d2.utils.events INFO: eta: 11:37:23 iter: 39 total_loss: 5.251 loss_ce: 2.201 loss_objectness: 0.681 loss_dice: 1.998 loss_mask: 0.3648 time: 0.2725 data_time: 0.0608 lr: 1.998e-06 max_mem: 3306M
[09/29 01:09:57] d2.utils.events INFO: eta: 11:32:33 iter: 59 total_loss: 4.8 loss_ce: 2.168 loss_objectness: 0.5971 loss_dice: 2 loss_mask: 0.04513 time: 0.2637 data_time: 0.0434 lr: 2.9971e-06 max_mem: 3306M
[09/29 01:10:02] d2.utils.events INFO: eta: 11:27:17 iter: 79 total_loss: 4.495 loss_ce: 2.049 loss_objectness: 0.3802 loss_dice: 2 loss_mask: 0.06612 time: 0.2663 data_time: 0.0682 lr: 3.9961e-06 max_mem: 3306M
[09/29 01:10:07] d2.utils.events INFO: eta: 11:22:18 iter: 99 total_loss: 3.744 loss_ce: 1.58 loss_objectness: 0.07274 loss_dice: 2 loss_mask: 0.1108 time: 0.2642 data_time: 0.0532 lr: 4.9951e-06 max_mem: 3306M
[09/29 01:10:12] d2.utils.events INFO: eta: 11:14:59 iter: 119 total_loss: 3.076 loss_ce: 0.9848 loss_objectness: 0.006809 loss_dice: 2 loss_mask: 0.06297 time: 0.2605 data_time: 0.0441 lr: 5.9941e-06 max_mem: 3306M
[09/29 01:10:17] d2.utils.events INFO: eta: 11:10:59 iter: 139 total_loss: 2.884 loss_ce: 0.8411 loss_objectness: 0.01047 loss_dice: 1.996 loss_mask: 0.03778 time: 0.2582 data_time: 0.0453 lr: 6.9931e-06 max_mem: 3306M
[09/29 01:10:22] d2.utils.events INFO: eta: 11:09:34 iter: 159 total_loss: 2.858 loss_ce: 0.8003 loss_objectness: 0.00477 loss_dice: 2 loss_mask: 0.04612 time: 0.2578 data_time: 0.0608 lr: 7.9921e-06 max_mem: 3306M
[09/29 01:10:27] d2.utils.events INFO: eta: 11:09:09 iter: 179 total_loss: 3 loss_ce: 0.9067 loss_objectness: 0.01053 loss_dice: 2 loss_mask: 0.08507 time: 0.2577 data_time: 0.0532 lr: 8.9911e-06 max_mem: 3306M
[09/29 01:10:33] d2.utils.events INFO: eta: 11:12:19 iter: 199 total_loss: 2.951 loss_ce: 0.8948 loss_objectness: 0.007725 loss_dice: 2 loss_mask: 0.05398 time: 0.2589 data_time: 0.0681 lr: 9.9901e-06 max_mem: 3306M
[09/29 01:10:38] d2.utils.events INFO: eta: 11:13:31 iter: 219 total_loss: 2.849 loss_ce: 0.811 loss_objectness: 0.03054 loss_dice: 1.976 loss_mask: 0.02872 time: 0.2595 data_time: 0.0613 lr: 1.0989e-05 max_mem: 3306M
[09/29 01:10:43] d2.utils.events INFO: eta: 11:13:26 iter: 239 total_loss: 2.832 loss_ce: 0.799 loss_objectness: 0.04742 loss_dice: 1.962 loss_mask: 0.01929 time: 0.2601 data_time: 0.0579 lr: 1.1988e-05 max_mem: 3306M
[09/29 01:10:48] d2.utils.events INFO: eta: 11:13:21 iter: 259 total_loss: 2.75 loss_ce: 0.7074 loss_objectness: 0.1952 loss_dice: 1.841 loss_mask: 0.02929 time: 0.2594 data_time: 0.0469 lr: 1.2987e-05 max_mem: 3306M
[09/29 01:10:53] d2.utils.events INFO: eta: 11:12:49 iter: 279 total_loss: 2.748 loss_ce: 0.7283 loss_objectness: 0.197 loss_dice: 1.819 loss_mask: 0.02512 time: 0.2591 data_time: 0.0560 lr: 1.3986e-05 max_mem: 3306M
[09/29 01:10:59] d2.utils.events INFO: eta: 11:10:31 iter: 299 total_loss: 2.764 loss_ce: 0.7646 loss_objectness: 0.2209 loss_dice: 1.722 loss_mask: 0.01942 time: 0.2595 data_time: 0.0649 lr: 1.4985e-05 max_mem: 3306M
[09/29 01:11:04] d2.utils.events INFO: eta: 11:13:07 iter: 319 total_loss: 2.743 loss_ce: 0.7367 loss_objectness: 0.2909 loss_dice: 1.681 loss_mask: 0.02197 time: 0.2597 data_time: 0.0602 lr: 1.5984e-05 max_mem: 3306M
[09/29 01:11:09] d2.utils.events INFO: eta: 11:12:34 iter: 339 total_loss: 2.78 loss_ce: 0.8159 loss_objectness: 0.2865 loss_dice: 1.65 loss_mask: 0.01936 time: 0.2598 data_time: 0.0640 lr: 1.6983e-05 max_mem: 3306M
[09/29 01:11:14] d2.utils.events INFO: eta: 11:12:56 iter: 359 total_loss: 2.814 loss_ce: 0.8537 loss_objectness: 0.2925 loss_dice: 1.646 loss_mask: 0.02119 time: 0.2595 data_time: 0.0519 lr: 1.7982e-05 max_mem: 3306M
[09/29 01:11:20] d2.utils.events INFO: eta: 11:13:43 iter: 379 total_loss: 2.674 loss_ce: 0.7588 loss_objectness: 0.3244 loss_dice: 1.58 loss_mask: 0.01589 time: 0.2597 data_time: 0.0611 lr: 1.8981e-05 max_mem: 3306M
[09/29 01:11:25] d2.utils.events INFO: eta: 11:13:08 iter: 399 total_loss: 2.684 loss_ce: 0.7729 loss_objectness: 0.3391 loss_dice: 1.564 loss_mask: 0.02145 time: 0.2591 data_time: 0.0514 lr: 1.998e-05 max_mem: 3306M
[09/29 01:11:30] d2.utils.events INFO: eta: 11:13:04 iter: 419 total_loss: 2.67 loss_ce: 0.7592 loss_objectness: 0.328 loss_dice: 1.581 loss_mask: 0.01416 time: 0.2589 data_time: 0.0525 lr: 2.0979e-05 max_mem: 3306M
[09/29 01:11:35] d2.utils.events INFO: eta: 11:12:37 iter: 439 total_loss: 2.728 loss_ce: 0.8217 loss_objectness: 0.3437 loss_dice: 1.535 loss_mask: 0.01109 time: 0.2586 data_time: 0.0579 lr: 2.1978e-05 max_mem: 3306M
[09/29 01:11:40] d2.utils.events INFO: eta: 11:12:54 iter: 459 total_loss: 2.696 loss_ce: 0.8166 loss_objectness: 0.3379 loss_dice: 1.539 loss_mask: 0.01454 time: 0.2581 data_time: 0.0453 lr: 2.2977e-05 max_mem: 3306M
[09/29 01:11:40] d2.engine.train_loop ERROR: Exception during training:
Traceback (most recent call last):
File "/raid/kirill/test/venv/lib/python3.9/site-packages/detectron2/engine/train_loop.py", line 149, in train
self.run_step()
File "/raid/kirill/test/venv/lib/python3.9/site-packages/detectron2/engine/defaults.py", line 494, in run_step
self._trainer.run_step()
File "/raid/kirill/test/venv/lib/python3.9/site-packages/detectron2/engine/train_loop.py", line 395, in run_step
loss_dict = self.model(data)
File "/raid/kirill/test/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/raid/kirill/test/SparseInst/./sparseinst/sparseinst.py", line 107, in forward
losses = self.criterion(output, targets, max_shape)
File "/raid/kirill/test/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/raid/kirill/test/SparseInst/./sparseinst/loss.py", line 184, in forward
indices = self.matcher(outputs_without_aux, targets, input_shape)
File "/raid/kirill/test/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/raid/kirill/test/SparseInst/./sparseinst/loss.py", line 301, in forward
indices = [linear_sum_assignment(c[i], maximize=True)
File "/raid/kirill/test/SparseInst/./sparseinst/loss.py", line 301, in <listcomp>
indices = [linear_sum_assignment(c[i], maximize=True)
ValueError: matrix contains invalid numeric entries
[09/29 01:11:40] d2.engine.hooks INFO: Overall training speed: 461 iterations in 0:01:59 (0.2582 s / it)
[09/29 01:11:40] d2.engine.hooks INFO: Total training time: 0:01:59 (0:00:00 on hooks)
[09/29 01:11:40] d2.utils.events INFO: eta: 11:12:44 iter: 463 total_loss: 2.682 loss_ce: 0.8189 loss_objectness: 0.3674 loss_dice: 1.459 loss_mask: 0.0153 time: 0.2578 data_time: 0.0435 lr: 2.3127e-05 max_mem: 3306M
I read this issue. However, this error still occurs. Are there any steps to avoid it?
Hi @kirillkoncha, sorry for the late reply. Maybe you can fix it by:
iam_prob = iam_prob.view(B, N, -1)
normalizer = iam_prob.sum(-1).float().clamp(min=1e-4)
iam_prob = iam_prob / normalizer[:, :, None]
Mostly, the NaN
errors occur due to the fp16
training.
Hi @kirillkoncha, sorry for the late reply. Maybe you can fix it by:
iam_prob = iam_prob.view(B, N, -1) normalizer = iam_prob.sum(-1).float().clamp(min=1e-4) iam_prob = iam_prob / normalizer[:, :, None]
Mostly, the
NaN
errors occur due to thefp16
training.
still error
Hi all, if you meet this problem, you could try this:
iam_prob = F.softmax(torch.logsigmoid(iam.view(B, N, 1)), dim=-1)
which can avoid the problem of numerical stability.