SparseInst
SparseInst copied to clipboard
Getting matrix contains invalid numeric entries error
When trying SparseInt with ViT, I get this error
File "/home/user/anaconda3/envs/detectron2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/user/ard/SparseInst/sparseinst/loss.py", line 301, in forward
indices = [linear_sum_assignment(c[i], maximize=True)
File "/home/user/ard/SparseInst/sparseinst/loss.py", line 301, in <listcomp>
indices = [linear_sum_assignment(c[i], maximize=True)
ValueError: matrix contains invalid numeric entries
Here's the config printed
[07/29 18:08:15 detectron2]: Command line arguments: Namespace(config_file='configs/sparse_inst_pvt_b2_li_giam.yaml', dist_url='tcp://127.0.0.1:50153', eval_only=False, machine_rank=0, num_gpus=4, num_machines=1, opts=['SOLVER.AMP.ENABLED', 'True'], resume=False)
[07/29 18:08:15 detectron2]: Contents of args.config_file=configs/sparse_inst_pvt_b2_li_giam.yaml:
_BASE_: "Base-SparseInst.yaml"
MODEL:
WEIGHTS: "pretrained_models/pvt_v2_b2_li.pth"
BACKBONE:
NAME: "build_pyramid_vision_transformer"
SPARSE_INST:
ENCODER:
IN_FEATURES: ["p2", "p3", "p4"]
PVT:
NAME: "b2"
LINEAR: True
OUT_FEATURES: ["p2", "p3", "p4"]
OUTPUT_DIR: "output/sparse_inst_pvt_b2_linear_giam"
[07/29 18:08:15 detectron2]: Running with full config:
CUDNN_BENCHMARK: false
DATALOADER:
ASPECT_RATIO_GROUPING: true
FILTER_EMPTY_ANNOTATIONS: true
NUM_WORKERS: 6
REPEAT_THRESHOLD: 0.0
SAMPLER_TRAIN: TrainingSampler
DATASETS:
PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000
PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000
PROPOSAL_FILES_TEST: []
PROPOSAL_FILES_TRAIN: []
TEST:
- bipacolortest
TRAIN:
- bipacolortrain
GLOBAL:
HACK: 1.0
INPUT:
CROP:
ENABLED: false
SIZE:
- 0.9
- 0.9
TYPE: relative_range
FORMAT: RGB
MASK_FORMAT: bitmask
MAX_SIZE_TEST: 853
MAX_SIZE_TRAIN: 853
MIN_SIZE_TEST: 640
MIN_SIZE_TRAIN:
- 416
- 448
- 480
- 512
- 544
- 576
- 608
- 640
MIN_SIZE_TRAIN_SAMPLING: choice
RANDOM_FLIP: horizontal
MODEL:
ANCHOR_GENERATOR:
ANGLES:
- - -90
- 0
- 90
ASPECT_RATIOS:
- - 0.5
- 1.0
- 2.0
NAME: DefaultAnchorGenerator
OFFSET: 0.0
SIZES:
- - 32
- 64
- 128
- 256
- 512
BACKBONE:
FREEZE_AT: 0
NAME: build_pyramid_vision_transformer
CSPNET:
NAME: darknet53
NORM: ''
OUT_FEATURES:
- csp1
- csp2
- csp3
- csp4
DEVICE: cuda
FPN:
FUSE_TYPE: sum
IN_FEATURES: []
NORM: ''
OUT_CHANNELS: 256
KEYPOINT_ON: false
LOAD_PROPOSALS: false
MASK_ON: true
META_ARCHITECTURE: SparseInst
PANOPTIC_FPN:
COMBINE:
ENABLED: true
INSTANCES_CONFIDENCE_THRESH: 0.5
OVERLAP_THRESH: 0.5
STUFF_AREA_LIMIT: 4096
INSTANCE_LOSS_WEIGHT: 1.0
PIXEL_MEAN:
- 123.675
- 116.28
- 103.53
PIXEL_STD:
- 58.395
- 57.12
- 57.375
PROPOSAL_GENERATOR:
MIN_SIZE: 0
NAME: RPN
PVT:
LINEAR: true
NAME: b2
OUT_FEATURES:
- p2
- p3
- p4
RESNETS:
DEFORM_MODULATED: false
DEFORM_NUM_GROUPS: 1
DEFORM_ON_PER_STAGE:
- false
- false
- false
- false
DEPTH: 50
NORM: FrozenBN
NUM_GROUPS: 1
OUT_FEATURES:
- res3
- res4
- res5
RES2_OUT_CHANNELS: 256
RES5_DILATION: 1
STEM_OUT_CHANNELS: 64
STRIDE_IN_1X1: false
WIDTH_PER_GROUP: 64
RETINANET:
BBOX_REG_LOSS_TYPE: smooth_l1
BBOX_REG_WEIGHTS: &id002
- 1.0
- 1.0
- 1.0
- 1.0
FOCAL_LOSS_ALPHA: 0.25
FOCAL_LOSS_GAMMA: 2.0
IN_FEATURES:
- p3
- p4
- p5
- p6
- p7
IOU_LABELS:
- 0
- -1
- 1
IOU_THRESHOLDS:
- 0.4
- 0.5
NMS_THRESH_TEST: 0.5
NORM: ''
NUM_CLASSES: 80
NUM_CONVS: 4
PRIOR_PROB: 0.01
SCORE_THRESH_TEST: 0.05
SMOOTH_L1_LOSS_BETA: 0.1
TOPK_CANDIDATES_TEST: 1000
ROI_BOX_CASCADE_HEAD:
BBOX_REG_WEIGHTS:
- &id001
- 10.0
- 10.0
- 5.0
- 5.0
- - 20.0
- 20.0
- 10.0
- 10.0
- - 30.0
- 30.0
- 15.0
- 15.0
IOUS:
- 0.5
- 0.6
- 0.7
ROI_BOX_HEAD:
BBOX_REG_LOSS_TYPE: smooth_l1
BBOX_REG_LOSS_WEIGHT: 1.0
BBOX_REG_WEIGHTS: *id001
CLS_AGNOSTIC_BBOX_REG: false
CONV_DIM: 256
FC_DIM: 1024
NAME: ''
NORM: ''
NUM_CONV: 0
NUM_FC: 0
POOLER_RESOLUTION: 14
POOLER_SAMPLING_RATIO: 0
POOLER_TYPE: ROIAlignV2
SMOOTH_L1_BETA: 0.0
TRAIN_ON_PRED_BOXES: false
ROI_HEADS:
BATCH_SIZE_PER_IMAGE: 512
IN_FEATURES:
- res4
IOU_LABELS:
- 0
- 1
IOU_THRESHOLDS:
- 0.5
NAME: Res5ROIHeads
NMS_THRESH_TEST: 0.5
NUM_CLASSES: 80
POSITIVE_FRACTION: 0.25
PROPOSAL_APPEND_GT: true
SCORE_THRESH_TEST: 0.05
ROI_KEYPOINT_HEAD:
CONV_DIMS:
- 512
- 512
- 512
- 512
- 512
- 512
- 512
- 512
LOSS_WEIGHT: 1.0
MIN_KEYPOINTS_PER_IMAGE: 1
NAME: KRCNNConvDeconvUpsampleHead
NORMALIZE_LOSS_BY_VISIBLE_KEYPOINTS: true
NUM_KEYPOINTS: 17
POOLER_RESOLUTION: 14
POOLER_SAMPLING_RATIO: 0
POOLER_TYPE: ROIAlignV2
ROI_MASK_HEAD:
CLS_AGNOSTIC_MASK: false
CONV_DIM: 256
NAME: MaskRCNNConvUpsampleHead
NORM: ''
NUM_CONV: 0
POOLER_RESOLUTION: 14
POOLER_SAMPLING_RATIO: 0
POOLER_TYPE: ROIAlignV2
RPN:
BATCH_SIZE_PER_IMAGE: 256
BBOX_REG_LOSS_TYPE: smooth_l1
BBOX_REG_LOSS_WEIGHT: 1.0
BBOX_REG_WEIGHTS: *id002
BOUNDARY_THRESH: -1
CONV_DIMS:
- -1
HEAD_NAME: StandardRPNHead
IN_FEATURES:
- res4
IOU_LABELS:
- 0
- -1
- 1
IOU_THRESHOLDS:
- 0.3
- 0.7
LOSS_WEIGHT: 1.0
NMS_THRESH: 0.7
POSITIVE_FRACTION: 0.5
POST_NMS_TOPK_TEST: 1000
POST_NMS_TOPK_TRAIN: 2000
PRE_NMS_TOPK_TEST: 6000
PRE_NMS_TOPK_TRAIN: 12000
SMOOTH_L1_BETA: 0.0
SEM_SEG_HEAD:
COMMON_STRIDE: 4
CONVS_DIM: 128
IGNORE_VALUE: 255
IN_FEATURES:
- p2
- p3
- p4
- p5
LOSS_WEIGHT: 1.0
NAME: SemSegFPNHead
NORM: GN
NUM_CLASSES: 54
SPARSE_INST:
CLS_THRESHOLD: 0.005
DATASET_MAPPER: SparseInstDatasetMapper
DECODER:
GROUPS: 4
INST:
CONVS: 4
DIM: 256
KERNEL_DIM: 128
MASK:
CONVS: 4
DIM: 256
NAME: GroupIAMDecoder
NUM_CLASSES: 10
NUM_MASKS: 100
OUTPUT_IAM: false
SCALE_FACTOR: 2.0
ENCODER:
IN_FEATURES:
- p2
- p3
- p4
NAME: InstanceContextEncoder
NORM: ''
NUM_CHANNELS: 256
LOSS:
CLASS_WEIGHT: 2.0
ITEMS:
- labels
- masks
MASK_DICE_WEIGHT: 2.0
MASK_PIXEL_WEIGHT: 5.0
NAME: SparseInstCriterion
OBJECTNESS_WEIGHT: 1.0
MASK_THRESHOLD: 0.45
MATCHER:
ALPHA: 0.8
BETA: 0.2
NAME: SparseInstMatcher
MAX_DETECTIONS: 100
WEIGHTS: sparse_inst_pvt_v2_b2_li_giam_02e25d.pth
OUTPUT_DIR: output/sparse_inst_pvt_b2_linear_giam
SEED: -1
SOLVER:
AMP:
ENABLED: true
AMSGRAD: false
BACKBONE_MULTIPLIER: 1.0
BASE_LR: 5.0e-05
BIAS_LR_FACTOR: 1.0
CHECKPOINT_PERIOD: 5000
CLIP_GRADIENTS:
CLIP_TYPE: value
CLIP_VALUE: 1.0
ENABLED: false
NORM_TYPE: 2.0
GAMMA: 0.1
IMS_PER_BATCH: 32
LR_SCHEDULER_NAME: WarmupMultiStepLR
MAX_ITER: 1500
MOMENTUM: 0.9
NESTEROV: false
OPTIMIZER: ADAMW
REFERENCE_WORLD_SIZE: 0
STEPS:
- 1166
- 1388
WARMUP_FACTOR: 0.001
WARMUP_ITERS: 1000
WARMUP_METHOD: linear
WEIGHT_DECAY: 0.05
WEIGHT_DECAY_BIAS: null
WEIGHT_DECAY_NORM: 0.0
TEST:
AUG:
ENABLED: false
FLIP: true
MAX_SIZE: 4000
MIN_SIZES:
- 400
- 500
- 600
- 700
- 800
- 900
- 1000
- 1100
- 1200
DETECTIONS_PER_IMAGE: 100
EVAL_PERIOD: 60
EXPECTED_RESULTS: []
KEYPOINT_OKS_SIGMAS: []
PRECISE_BN:
ENABLED: false
NUM_ITER: 200
VERSION: 2
VIS_PERIOD: 0
Hi @sarmientoj24, thanks for your interest in SparseInst. Have you load any pretrained weights?
Yes
On Mon, Aug 1, 2022, 17:37 Tianheng Cheng @.***> wrote:
Hi @sarmientoj24 https://github.com/sarmientoj24, thanks for your interest in SparseInst. Have you load any pretrained weights?
— Reply to this email directly, view it on GitHub https://github.com/hustvl/SparseInst/issues/66#issuecomment-1200961391, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACDL23Z2BH6QR4Y3GWFTRGTVW6LGZANCNFSM55BRMKUQ . You are receiving this because you were mentioned.Message ID: @.***>
Could you provide the log of the training process?
Could you provide the log of the training process?
Hello, I met the same problem as well when I try to change the optimizer from ADAMW to SGD.
I fell into this problem too. The cause is: SOLVER: AMP: ENABLED: true
Set it to False (use FP32) and the error disappears
I've tried to debug but wasn't able to fix it
I haven't loaded any pre-trained weights. The problem also exists.
Hi all, I've found that the sigmoid + norm
in the decoder will cause the NaN error when FP16 is enabled. In the latest update, we provide a special softmax version of the decoder to avoid numerical errors, and it supports FP16 better than the sigmoid + norm
. Sorry for the late reply and hope my suggestion can help you.
Hi all, I've found that the
sigmoid + norm
in the decoder will cause the NaN error when FP16 is enabled. In the latest update, we provide a special softmax version of the decoder to avoid numerical errors, and it supports FP16 better than thesigmoid + norm
. Sorry for the late reply and hope my suggestion can help you.
I got the same problem now. I am using pre-trained weights and trying to train R-50-vd-DCN model. Are there additional steps to use new softmax version?
It seems sigmoid + norm
is used by default. Adding MODEL.SPARSE_INST.DECODER.NAME GroupIAMSoftDecoder
to the command line solved the problem for me.
It seems
sigmoid + norm
is used by default. AddingMODEL.SPARSE_INST.DECODER.NAME GroupIAMSoftDecoder
to the command line solved the problem for me.
It still does not work...