yolov7_d2 Cann't train with solov2 config

Hi, Thanks for this repo :-) I am try to train the network (and do an overfitt) using solov2 config When I start the training process I see the image but the masks are wrong (the image flipped, but the masks not). When I close the image, the training crash. attached the log. Thanks

python3 train_net.py --config-file configs/coco-instance/solov2_lite.yaml 
Install mish-cuda to speed up training and inference. More importantly, replace the naive Mish with MishCuda will give a ~1.5G memory saving during training.
Command Line Args: Namespace(config_file='configs/coco-instance/solov2_lite.yaml', dist_url='tcp://127.0.0.1:50152', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=[], resume=False)
[05/09 13:50:05 detectron2]: Rank of current process: 0. World size: 1
[05/09 13:50:06 detectron2]: Environment info:
----------------------  ---------------------------------------------------------------------
sys.platform            linux
Python                  3.6.9 (default, Mar 15 2022, 13:55:28) [GCC 8.4.0]
numpy                   1.19.2
detectron2              0.6 @/home/ws/.local/lib/python3.6/site-packages/detectron2
Compiler                GCC 7.3
CUDA compiler           CUDA 10.2
detectron2 arch flags   3.7, 5.0, 5.2, 6.0, 6.1, 7.0, 7.5
DETECTRON2_ENV_MODULE   <not set>
PyTorch                 1.10.0+cu102 @/home/ws/.local/lib/python3.6/site-packages/torch
PyTorch debug build     False
GPU available           Yes
GPU 0                   GeForce RTX 2080 Ti (arch=7.5)
Driver version          450.57
CUDA_HOME               /usr/local/cuda-10.2
Pillow                  6.2.2
torchvision             0.11.0+cu102 @/home/ws/.local/lib/python3.6/site-packages/torchvision
torchvision arch flags  3.5, 5.0, 6.0, 7.0, 7.5
fvcore                  0.1.5.post20220414
iopath                  0.1.9
cv2                     4.5.5
----------------------  ---------------------------------------------------------------------
PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 10.2
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70
  - CuDNN 7.6.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=7.6.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

[05/09 13:50:06 detectron2]: Command line arguments: Namespace(config_file='configs/coco-instance/solov2_lite.yaml', dist_url='tcp://127.0.0.1:50152', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=[], resume=False)
[05/09 13:50:06 detectron2]: Contents of args.config_file=configs/coco-instance/solov2_lite.yaml:
MODEL:
  META_ARCHITECTURE: "SOLOv2"
  MASK_ON: True
  BACKBONE:
    NAME: "build_resnet_fpn_backbone"
  RESNETS:
    OUT_FEATURES: ["res2", "res3", "res4", "res5"]
  FPN:
    IN_FEATURES: ["res2", "res3", "res4", "res5"]
  SOLOV2:
    FPN_SCALE_RANGES: ((1, 56), (28, 112), (56, 224), (112, 448), (224, 896))
    NUM_GRIDS: [40, 36, 24, 16, 12]
    NUM_INSTANCE_CONVS: 2
    NUM_KERNELS: 256
    INSTANCE_IN_CHANNELS: 256
    INSTANCE_CHANNELS: 128
    MASK_IN_CHANNELS: 256
    MASK_CHANNELS: 128
    NORM: "SyncBN"
DATASETS:
  TRAIN: ("nets_kinneret_only24",)
  TEST: ("nets_kinneret_only24",)
SOLVER:
  IMS_PER_BATCH: 8
  BASE_LR: 0.01
  WARMUP_FACTOR: 0.01
  WARMUP_ITERS: 1000
  STEPS: (60000, 80000)
  MAX_ITER: 90000
INPUT:
  MIN_SIZE_TRAIN: (640, 672, 704, 736, 768, 800)
  MASK_FORMAT: "bitmask"
VERSION: 2

[05/09 13:50:06 detectron2]: Running with full config:
CUDNN_BENCHMARK: false
DATALOADER:
  ASPECT_RATIO_GROUPING: true
  FILTER_EMPTY_ANNOTATIONS: true
  NUM_WORKERS: 4
  REPEAT_THRESHOLD: 0.0
  SAMPLER_TRAIN: TrainingSampler
DATASETS:
  CLASS_NAMES: []
  PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000
  PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000
  PROPOSAL_FILES_TEST: []
  PROPOSAL_FILES_TRAIN: []
  TEST:
  - nets_kinneret_only24
  TRAIN:
  - nets_kinneret_only24
GLOBAL:
  HACK: 1.0
INPUT:
  COLOR_JITTER:
    BRIGHTNESS: false
    LIGHTING: false
    SATURATION: false
  CROP:
    ENABLED: false
    SIZE:
    - 0.9
    - 0.9
    TYPE: relative_range
  DISTORTION:
    ENABLED: false
    EXPOSURE: 1.5
    HUE: 0.1
    SATURATION: 1.5
  FORMAT: BGR
  GRID_MASK:
    ENABLED: false
    MODE: 1
    PROB: 0.3
    USE_HEIGHT: true
    USE_WIDTH: true
  INPUT_SIZE:
  - 640
  - 640
  JITTER_CROP:
    ENABLED: false
    JITTER_RATIO: 0.3
  MASK_FORMAT: bitmask
  MAX_SIZE_TEST: 1333
  MAX_SIZE_TRAIN: 1333
  MIN_SIZE_TEST: 800
  MIN_SIZE_TRAIN:
  - 640
  - 672
  - 704
  - 736
  - 768
  - 800
  MIN_SIZE_TRAIN_SAMPLING: choice
  MOSAIC:
    DEBUG_VIS: false
    ENABLED: false
    MIN_OFFSET: 0.2
    MOSAIC_HEIGHT: 640
    MOSAIC_WIDTH: 640
    NUM_IMAGES: 4
    POOL_CAPACITY: 1000
  MOSAIC_AND_MIXUP:
    DEBUG_VIS: false
    DEGREES: 10.0
    DISABLE_AT_ITER: 120000
    ENABLED: false
    ENABLE_MIXUP: true
    MOSAIC_HEIGHT_RANGE:
    - 512
    - 800
    MOSAIC_WIDTH_RANGE:
    - 512
    - 800
    MSCALE:
    - 0.5
    - 1.5
    NUM_IMAGES: 4
    PERSPECTIVE: 0.0
    POOL_CAPACITY: 1000
    SCALE:
    - 0.5
    - 1.5
    SHEAR: 2.0
    TRANSLATE: 0.1
  RANDOM_FLIP: horizontal
  RESIZE:
    ENABLED: false
    SCALE_JITTER:
    - 0.8
    - 1.2
    SHAPE:
    - 640
    - 640
    TEST_SHAPE:
    - 608
    - 608
  SHIFT:
    SHIFT_PIXELS: 32
MODEL:
  ANCHOR_GENERATOR:
    ANGLES:
    - - -90
      - 0
      - 90
    ASPECT_RATIOS:
    - - 0.5
      - 1.0
      - 2.0
    NAME: DefaultAnchorGenerator
    OFFSET: 0.0
    SIZES:
    - - 32
      - 64
      - 128
      - 256
      - 512
  BACKBONE:
    CHANNEL: 0
    FREEZE_AT: 2
    NAME: build_resnet_fpn_backbone
    SIMPLE: false
    STRIDE: 1
  BIFPN:
    NORM: GN
    NUM_BIFPN: 6
    NUM_LEVELS: 5
    OUT_CHANNELS: 160
    SEPARABLE_CONV: false
  DARKNET:
    DEPTH: 53
    DEPTH_WISE: false
    NORM: BN
    OUT_FEATURES:
    - dark3
    - dark4
    - dark5
    RES5_DILATION: 1
    STEM_OUT_CHANNELS: 32
    WEIGHTS: ''
    WITH_CSP: true
  DETR:
    ATTENTION_TYPE: DETR
    BBOX_EMBED_NUM_LAYERS: 3
    CENTERED_POSITION_ENCODIND: false
    CLS_WEIGHT: 1.0
    DECODER_BLOCK_GRAD: true
    DEC_LAYERS: 6
    DEEP_SUPERVISION: true
    DEFORMABLE: false
    DIM_FEEDFORWARD: 2048
    DROPOUT: 0.1
    ENC_LAYERS: 6
    FROZEN_WEIGHTS: ''
    GIOU_WEIGHT: 2.0
    HIDDEN_DIM: 256
    L1_WEIGHT: 5.0
    NHEADS: 8
    NO_OBJECT_WEIGHT: 0.1
    NUM_CLASSES: 80
    NUM_FEATURE_LEVELS: 1
    NUM_OBJECT_QUERIES: 100
    NUM_QUERY_PATTERN: 3
    NUM_QUERY_POSITION: 300
    PRE_NORM: false
    SPATIAL_PRIOR: learned
    TWO_STAGE: false
    USE_FOCAL_LOSS: false
    WITH_BOX_REFINE: false
  DEVICE: cuda
  EFFICIENTNET:
    FEATURE_INDICES:
    - 1
    - 4
    - 10
    - 15
    NAME: efficientnet_b0
    OUT_FEATURES:
    - stride4
    - stride8
    - stride16
    - stride32
    PRETRAINED: true
  FBNET_V2:
    ARCH: default
    ARCH_DEF: []
    NORM: bn
    NORM_ARGS: []
    OUT_FEATURES:
    - trunk3
    SCALE_FACTOR: 1.0
    STEM_IN_CHANNELS: 3
    WIDTH_DIVISOR: 1
  FPN:
    FUSE_TYPE: sum
    IN_FEATURES:
    - res2
    - res3
    - res4
    - res5
    NORM: ''
    OUT_CHANNELS: 256
    OUT_CHANNELS_LIST:
    - 256
    - 512
    - 1024
    REPEAT: 2
  KEYPOINT_ON: false
  LOAD_PROPOSALS: false
  MASK_ON: true
  META_ARCHITECTURE: SOLOv2
  NMS_TYPE: normal
  ONNX_EXPORT: false
  PADDED_VALUE: 114.0
  PANOPTIC_FPN:
    COMBINE:
      ENABLED: true
      INSTANCES_CONFIDENCE_THRESH: 0.5
      OVERLAP_THRESH: 0.5
      STUFF_AREA_LIMIT: 4096
    INSTANCE_LOSS_WEIGHT: 1.0
  PIXEL_MEAN:
  - 103.53
  - 116.28
  - 123.675
  PIXEL_STD:
  - 1.0
  - 1.0
  - 1.0
  PROPOSAL_GENERATOR:
    MIN_SIZE: 0
    NAME: RPN
  REGNETS:
    OUT_FEATURES:
    - s2
    - s3
    - s4
    TYPE: x
  RESNETS:
    DEFORM_MODULATED: false
    DEFORM_NUM_GROUPS: 1
    DEFORM_ON_PER_STAGE:
    - false
    - false
    - false
    - false
    DEPTH: 50
    NORM: FrozenBN
    NUM_GROUPS: 1
    OUT_FEATURES:
    - res2
    - res3
    - res4
    - res5
    R2TYPE: res2net50_v1d
    RES2_OUT_CHANNELS: 256
    RES5_DILATION: 1
    STEM_OUT_CHANNELS: 64
    STRIDE_IN_1X1: true
    WIDTH_PER_GROUP: 64
  RETINANET:
    BBOX_REG_LOSS_TYPE: smooth_l1
    BBOX_REG_WEIGHTS:
    - 1.0
    - 1.0
    - 1.0
    - 1.0
    FOCAL_LOSS_ALPHA: 0.25
    FOCAL_LOSS_GAMMA: 2.0
    IN_FEATURES:
    - p3
    - p4
    - p5
    - p6
    - p7
    IOU_LABELS:
    - 0
    - -1
    - 1
    IOU_THRESHOLDS:
    - 0.4
    - 0.5
    NMS_THRESH_TEST: 0.5
    NORM: ''
    NUM_CLASSES: 80
    NUM_CONVS: 4
    PRIOR_PROB: 0.01
    SCORE_THRESH_TEST: 0.05
    SMOOTH_L1_LOSS_BETA: 0.1
    TOPK_CANDIDATES_TEST: 1000
  ROI_BOX_CASCADE_HEAD:
    BBOX_REG_WEIGHTS:
    - - 10.0
      - 10.0
      - 5.0
      - 5.0
    - - 20.0
      - 20.0
      - 10.0
      - 10.0
    - - 30.0
      - 30.0
      - 15.0
      - 15.0
    IOUS:
    - 0.5
    - 0.6
    - 0.7
  ROI_BOX_HEAD:
    BBOX_REG_LOSS_TYPE: smooth_l1
    BBOX_REG_LOSS_WEIGHT: 1.0
    BBOX_REG_WEIGHTS:
    - 10.0
    - 10.0
    - 5.0
    - 5.0
    CLS_AGNOSTIC_BBOX_REG: false
    CONV_DIM: 256
    FC_DIM: 1024
    NAME: ''
    NORM: ''
    NUM_CONV: 0
    NUM_FC: 0
    POOLER_RESOLUTION: 14
    POOLER_SAMPLING_RATIO: 0
    POOLER_TYPE: ROIAlignV2
    SMOOTH_L1_BETA: 0.0
    TRAIN_ON_PRED_BOXES: false
  ROI_HEADS:
    BATCH_SIZE_PER_IMAGE: 512
    IN_FEATURES:
    - res4
    IOU_LABELS:
    - 0
    - 1
    IOU_THRESHOLDS:
    - 0.5
    NAME: Res5ROIHeads
    NMS_THRESH_TEST: 0.5
    NUM_CLASSES: 80
    POSITIVE_FRACTION: 0.25
    PROPOSAL_APPEND_GT: true
    SCORE_THRESH_TEST: 0.05
  ROI_KEYPOINT_HEAD:
    CONV_DIMS:
    - 512
    - 512
    - 512
    - 512
    - 512
    - 512
    - 512
    - 512
    LOSS_WEIGHT: 1.0
    MIN_KEYPOINTS_PER_IMAGE: 1
    NAME: KRCNNConvDeconvUpsampleHead
    NORMALIZE_LOSS_BY_VISIBLE_KEYPOINTS: true
    NUM_KEYPOINTS: 17
    POOLER_RESOLUTION: 14
    POOLER_SAMPLING_RATIO: 0
    POOLER_TYPE: ROIAlignV2
  ROI_MASK_HEAD:
    CLS_AGNOSTIC_MASK: false
    CONV_DIM: 256
    NAME: MaskRCNNConvUpsampleHead
    NORM: ''
    NUM_CONV: 0
    POOLER_RESOLUTION: 14
    POOLER_SAMPLING_RATIO: 0
    POOLER_TYPE: ROIAlignV2
  RPN:
    BATCH_SIZE_PER_IMAGE: 256
    BBOX_REG_LOSS_TYPE: smooth_l1
    BBOX_REG_LOSS_WEIGHT: 1.0
    BBOX_REG_WEIGHTS:
    - 1.0
    - 1.0
    - 1.0
    - 1.0
    BOUNDARY_THRESH: -1
    CONV_DIMS:
    - -1
    HEAD_NAME: StandardRPNHead
    IN_FEATURES:
    - res4
    IOU_LABELS:
    - 0
    - -1
    - 1
    IOU_THRESHOLDS:
    - 0.3
    - 0.7
    LOSS_WEIGHT: 1.0
    NMS_THRESH: 0.7
    POSITIVE_FRACTION: 0.5
    POST_NMS_TOPK_TEST: 1000
    POST_NMS_TOPK_TRAIN: 2000
    PRE_NMS_TOPK_TEST: 6000
    PRE_NMS_TOPK_TRAIN: 12000
    SMOOTH_L1_BETA: 0.0
  SEM_SEG_HEAD:
    COMMON_STRIDE: 4
    CONVS_DIM: 128
    IGNORE_VALUE: 255
    IN_FEATURES:
    - p2
    - p3
    - p4
    - p5
    LOSS_WEIGHT: 1.0
    NAME: SemSegFPNHead
    NORM: GN
    NUM_CLASSES: 54
  SOLOV2:
    FPN_INSTANCE_STRIDES:
    - 8
    - 8
    - 16
    - 32
    - 32
    FPN_SCALE_RANGES:
    - - 1
      - 56
    - - 28
      - 112
    - - 56
      - 224
    - - 112
      - 448
    - - 224
      - 896
    INSTANCE_CHANNELS: 128
    INSTANCE_IN_CHANNELS: 256
    INSTANCE_IN_FEATURES:
    - p2
    - p3
    - p4
    - p5
    - p6
    LOSS:
      DICE_WEIGHT: 3.0
      FOCAL_ALPHA: 0.25
      FOCAL_GAMMA: 2.0
      FOCAL_USE_SIGMOID: true
      FOCAL_WEIGHT: 1.0
    MASK_CHANNELS: 128
    MASK_IN_CHANNELS: 256
    MASK_IN_FEATURES:
    - p2
    - p3
    - p4
    - p5
    MASK_THR: 0.5
    MAX_PER_IMG: 100
    NMS_KERNEL: gaussian
    NMS_PRE: 500
    NMS_SIGMA: 2
    NMS_TYPE: matrix
    NORM: SyncBN
    NUM_CLASSES: 80
    NUM_GRIDS:
    - 40
    - 36
    - 24
    - 16
    - 12
    NUM_INSTANCE_CONVS: 2
    NUM_KERNELS: 256
    NUM_MASKS: 256
    PRIOR_PROB: 0.01
    SCORE_THR: 0.1
    SIGMA: 0.2
    TYPE_DCN: DCN
    UPDATE_THR: 0.05
    USE_COORD_CONV: true
    USE_DCN_IN_INSTANCE: false
  SPARSE_INST:
    CLS_THRESHOLD: 0.005
    DATASET_MAPPER: SparseInstDatasetMapper
    DECODER:
      GROUPS: 4
      INST:
        CONVS: 4
        DIM: 256
      KERNEL_DIM: 128
      MASK:
        CONVS: 4
        DIM: 256
      NAME: BaseIAMDecoder
      NUM_CLASSES: 80
      NUM_MASKS: 100
      OUTPUT_IAM: false
      SCALE_FACTOR: 2.0
    ENCODER:
      IN_FEATURES:
      - res3
      - res4
      - res5
      NAME: FPNPPMEncoder
      NORM: ''
      NUM_CHANNELS: 256
    LOSS:
      CLASS_WEIGHT: 2.0
      ITEMS:
      - labels
      - masks
      MASK_DICE_WEIGHT: 2.0
      MASK_PIXEL_WEIGHT: 5.0
      NAME: SparseInstCriterion
      OBJECTNESS_WEIGHT: 1.0
    MASK_THRESHOLD: 0.45
    MATCHER:
      ALPHA: 0.8
      BETA: 0.2
      NAME: SparseInstMatcher
    MAX_DETECTIONS: 100
  SWIN:
    DEPTHS:
    - 2
    - 2
    - 6
    - 2
    OUT_FEATURES:
    - 1
    - 2
    - 3
    PATCH: 4
    TYPE: tiny
    WEIGHTS: ''
    WINDOW: 7
  VT_FPN:
    HEADS: 16
    IN_FEATURES:
    - res2
    - res3
    - res4
    - res5
    LAYERS: 3
    MIN_GROUP_PLANES: 64
    NORM: BN
    OUT_CHANNELS: 256
    POS_HWS: []
    POS_N_DOWNSAMPLE: []
    TOKEN_C: 1024
    TOKEN_LS:
    - 16
    - 16
    - 8
    - 8
  WEIGHTS: ''
  YOLO:
    ANCHORS:
    - - - 116
        - 90
      - - 156
        - 198
      - - 373
        - 326
    - - - 30
        - 61
      - - 62
        - 45
      - - 42
        - 119
    - - - 10
        - 13
      - - 16
        - 30
      - - 33
        - 23
    ANCHOR_MASK: []
    BRANCH_DILATIONS:
    - 1
    - 2
    - 3
    CLASSES: 80
    CONF_THRESHOLD: 0.01
    DEPTH_MUL: 1.0
    IGNORE_THRESHOLD: 0.07
    IN_FEATURES:
    - dark3
    - dark4
    - dark5
    IOU_TYPE: ciou
    LOSS:
      ANCHOR_RATIO_THRESH: 4.0
      BUILD_TARGET_TYPE: default
      LAMBDA_CLS: 1.0
      LAMBDA_CONF: 1.0
      LAMBDA_IOU: 1.1
      LAMBDA_WH: 1.0
      LAMBDA_XY: 1.0
      USE_L1: true
    LOSS_TYPE: v4
    MAX_BOXES_NUM: 100
    NECK:
      TYPE: yolov3
      WITH_SPP: false
    NMS_THRESHOLD: 0.5
    NUM_BRANCH: 3
    ORIEN_HEAD:
      UP_CHANNELS: 64
    TEST_BRANCH_IDX: 1
    VARIANT: yolov3
    WIDTH_MUL: 1.0
OUTPUT_DIR: ./output
SEED: -1
SOLVER:
  AMP:
    ENABLED: false
  AMSGRAD: false
  AUTO_SCALING_METHODS:
  - default_scale_d2_configs
  - default_scale_quantization_configs
  BACKBONE_MULTIPLIER: 0.1
  BASE_LR: 0.01
  BIAS_LR_FACTOR: 1.0
  CHECKPOINT_PERIOD: 5000
  CLIP_GRADIENTS:
    CLIP_TYPE: value
    CLIP_VALUE: 1.0
    ENABLED: false
    NORM_TYPE: 2.0
  GAMMA: 0.1
  IMS_PER_BATCH: 8
  LR_MULTIPLIER_OVERWRITE: []
  LR_SCHEDULER:
    GAMMA: 0.1
    MAX_EPOCH: 500
    MAX_ITER: 40000
    NAME: WarmupMultiStepLR
    STEPS:
    - 30000
    WARMUP_FACTOR: 0.001
    WARMUP_ITERS: 1000
    WARMUP_METHOD: linear
  LR_SCHEDULER_NAME: WarmupMultiStepLR
  MAX_ITER: 90000
  MOMENTUM: 0.9
  NESTEROV: false
  OPTIMIZER: ADAMW
  REFERENCE_WORLD_SIZE: 8
  STEPS:
  - 60000
  - 80000
  WARMUP_FACTOR: 0.01
  WARMUP_ITERS: 1000
  WARMUP_METHOD: linear
  WEIGHT_DECAY: 0.0001
  WEIGHT_DECAY_BIAS: null
  WEIGHT_DECAY_EMBED: 0.0
  WEIGHT_DECAY_NORM: 0.0
TEST:
  AUG:
    ENABLED: false
    FLIP: true
    MAX_SIZE: 4000
    MIN_SIZES:
    - 400
    - 500
    - 600
    - 700
    - 800
    - 900
    - 1000
    - 1100
    - 1200
  DETECTIONS_PER_IMAGE: 100
  EVAL_PERIOD: 0
  EXPECTED_RESULTS: []
  KEYPOINT_OKS_SIGMAS: []
  PRECISE_BN:
    ENABLED: false
    NUM_ITER: 200
VERSION: 2
VIS_PERIOD: 0

[05/09 13:50:06 detectron2]: Full config saved to ./output/config.yaml
[05/09 13:50:06 d2.utils.env]: Using a generated random seed 6550842
[05/09 13:50:06 d2.engine.defaults]: Auto-scaling the config to batch_size=1, learning_rate=0.00125, max_iter=720000, warmup=8000.
13:50:06 05.09 INFO solov2.py:83]: instance_shapes: [ShapeSpec(channels=256, height=None, width=None, stride=4), ShapeSpec(channels=256, height=None, width=None, stride=8), ShapeSpec(channels=256, height=None, width=None, stride=16), ShapeSpec(channels=256, height=None, width=None, stride=32), ShapeSpec(channels=256, height=None, width=None, stride=64)]
[05/09 13:50:08 d2.data.datasets.coco]: Loaded 87 images in COCO format from /home/ws/data/dataset/nets_kinneret_only24_2/train_coco.json
[05/09 13:50:08 d2.data.build]: Removed 0 images with no usable annotations. 87 images left.
[05/09 13:50:08 d2.data.build]: Distribution of instances among all 3 categories:
|  category  | #instances   |  category  | #instances   |  category  | #instances   |
|:----------:|:-------------|:----------:|:-------------|:----------:|:-------------|
|    car     | 1305         |    bus     | 0            |   truck    | 0            |
|            |              |            |              |            |              |
|   total    | 1305         |            |              |            |              |
[05/09 13:50:08 d2.data.build]: Using training sampler TrainingSampler
[05/09 13:50:08 d2.data.common]: Serializing 87 elements to byte tensors and concatenating them all ...
[05/09 13:50:08 d2.data.common]: Serialized dataset takes 0.82 MiB
[05/09 13:50:08 fvcore.common.checkpoint]: No checkpoint found. Initializing model from scratch
[05/09 13:50:08 d2.engine.train_loop]: Starting training from iteration 0
(15, 768, 768)
/home/ws/.local/lib/python3.6/site-packages/detectron2/structures/image_list.py:88: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  max_size = (max_size + (stride - 1)) // stride * stride
[(768, 768)]
torch.Size([1, 3, 768, 768])
/home/ws/.local/lib/python3.6/site-packages/torch/nn/functional.py:3635: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode)
/home/ws/.local/lib/python3.6/site-packages/torch/nn/functional.py:3680: UserWarning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and now uses scale_factor directly, instead of relying on the computed output size. If you wish to restore the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. 
  "The default behavior for interpolate/upsample with float scale_factor changed "
/home/ws/.local/lib/python3.6/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  ../aten/src/ATen/native/TensorShape.cpp:2157.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/home/ws/PycharmProjects/yolov7/yolov7/modeling/meta_arch/solov2.py:300: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  (center_w / upsampled_size[1]) // (1. / num_grid))
/home/ws/PycharmProjects/yolov7/yolov7/modeling/meta_arch/solov2.py:302: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  (center_h / upsampled_size[0]) // (1. / num_grid))
/home/ws/PycharmProjects/yolov7/yolov7/modeling/meta_arch/solov2.py:306: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  0, int(((center_h - half_h) / upsampled_size[0]) // (1. / num_grid)))
/home/ws/PycharmProjects/yolov7/yolov7/modeling/meta_arch/solov2.py:308: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  num_grid - 1, int(((center_h + half_h) / upsampled_size[0]) // (1. / num_grid)))
/home/ws/PycharmProjects/yolov7/yolov7/modeling/meta_arch/solov2.py:310: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  0, int(((center_w - half_w) / upsampled_size[1]) // (1. / num_grid)))
/home/ws/PycharmProjects/yolov7/yolov7/modeling/meta_arch/solov2.py:312: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  num_grid - 1, int(((center_w + half_w) / upsampled_size[1]) // (1. / num_grid)))
ERROR [05/09 13:50:12 d2.engine.train_loop]: Exception during training:
Traceback (most recent call last):
  File "/home/ws/.local/lib/python3.6/site-packages/detectron2/engine/train_loop.py", line 149, in train
    self.run_step()
  File "train_net.py", line 58, in run_step
    self._trainer.run_step()
  File "/home/ws/.local/lib/python3.6/site-packages/detectron2/engine/train_loop.py", line 285, in run_step
    losses.backward()
  File "/home/ws/.local/lib/python3.6/site-packages/torch/_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/ws/.local/lib/python3.6/site-packages/torch/autograd/__init__.py", line 156, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 128, 192, 192]], which is output 0 of ReluBackward0, is at version 3; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
[05/09 13:50:12 d2.engine.hooks]: Total training time: 0:00:03 (0:00:00 on hooks)
[05/09 13:50:12 d2.utils.events]:  iter: 0    lr: N/A  max_mem: 710M
Traceback (most recent call last):
  File "train_net.py", line 133, in <module>
    args=(args,),
  File "/home/ws/.local/lib/python3.6/site-packages/detectron2/engine/launch.py", line 82, in launch
    main_func(*args)
  File "train_net.py", line 121, in main
    return trainer.train()
  File "/home/ws/.local/lib/python3.6/site-packages/detectron2/engine/defaults.py", line 484, in train
    super().train(self.start_iter, self.max_iter)
  File "/home/ws/.local/lib/python3.6/site-packages/detectron2/engine/train_loop.py", line 149, in train
    self.run_step()
  File "train_net.py", line 58, in run_step
    self._trainer.run_step()
  File "/home/ws/.local/lib/python3.6/site-packages/detectron2/engine/train_loop.py", line 285, in run_step
    losses.backward()
  File "/home/ws/.local/lib/python3.6/site-packages/torch/_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/ws/.local/lib/python3.6/site-packages/torch/autograd/__init__.py", line 156, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 128, 192, 192]], which is output 0 of ReluBackward0, is at version 3; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

May 09 '22 10:05 sdimantsd

Can u try find out which op using inplace? I didn't get this before.

May 09 '22 11:05 lucasjinreal

try change this line https://github.com/jinfagang/yolov7/blob/f9c0b723be90bc3fbf7955f1d2c0344d5f52c5e1/yolov7/modeling/head/solov2_head.py#L264 to feature_add_all_level = feature_add_all_level + self.convs_all_levels[i](mask_feat)

May 09 '22 12:05 acai66

@acai66 Can u make a PR to solov2 if this works?

May 09 '22 14:05 lucasjinreal

@sdimantsd Can u please pull and try again? it should be fixed.

May 10 '22 06:05 lucasjinreal

@jinfagang Now the masks are OK and the training not crash. But it's only display the images and not start training

May 10 '22 08:05 sdimantsd

If I change the line in solov2.py from: im = visualize_det_cv2_part(im, None, clss, bboxes, is_show=True) to im = visualize_det_cv2_part(im, None, clss, bboxes, is_show=False) (changed to: is_show=False) it's works. but I think all of it:

      for a in batched_inputs:
           img = a["image"].cpu().permute(1, 2, 0).numpy().astype(np.uint8)
           ins = a['instances']
           bboxes = ins.gt_boxes.tensor.cpu().numpy().astype(int)
           clss = ins.gt_classes.cpu().numpy()
           im = img.copy()
           bit_masks = ins.gt_masks.tensor.cpu().numpy()
           print(bit_masks.shape)
           # img = vis_bitmasks_with_classes(img, clss, bit_masks)
           im = vis_bitmasks(im, bit_masks)
           im = visualize_det_cv2_part(im, None, clss, bboxes, is_show=False)

and this:

        print(images.image_sizes)
        print(images.tensor.shape)

is unnecessary during the training

May 10 '22 08:05 sdimantsd

@sdimantsd You are right, these files is for debugging GT is right or not. Can u verify is the GT is right or not from dataloader?

You can send me a PR is you verify the GT is right, just comment out these files.

May 10 '22 08:05 lucasjinreal

Hi @jinfagang Thanks! It look like the GT are good. and the training started. I will let you know if the overfitting works. Thanks

May 10 '22 14:05 sdimantsd

Hi @jinfagang I am try to overfitting SoloV2. but it's not working. I changed the dataset to a custom dataset and only 3 labels (car, bus, truck). Those are the last lines in the log:

[05/11 11:27:11 d2.utils.events]:  eta: 0:01:00  iter: 719399  total_loss: 3.028  loss_ins: 2.591  loss_cate: 0.4368  time: 0.1018  data_time: 0.0019  lr: 1.25e-05  max_mem: 861M
[05/11 11:27:13 d2.utils.events]:  eta: 0:00:58  iter: 719419  total_loss: 3.028  loss_ins: 2.591  loss_cate: 0.4368  time: 0.1018  data_time: 0.0019  lr: 1.25e-05  max_mem: 861M
[05/11 11:27:16 d2.utils.events]:  eta: 0:00:56  iter: 719439  total_loss: 3.028  loss_ins: 2.591  loss_cate: 0.4368  time: 0.1018  data_time: 0.0019  lr: 1.25e-05  max_mem: 861M
[05/11 11:27:18 d2.utils.events]:  eta: 0:00:54  iter: 719459  total_loss: 3.028  loss_ins: 2.591  loss_cate: 0.4368  time: 0.1018  data_time: 0.0018  lr: 1.25e-05  max_mem: 861M
[05/11 11:27:20 d2.utils.events]:  eta: 0:00:52  iter: 719479  total_loss: 3.028  loss_ins: 2.591  loss_cate: 0.4368  time: 0.1018  data_time: 0.0018  lr: 1.25e-05  max_mem: 861M
[05/11 11:27:22 d2.utils.events]:  eta: 0:00:50  iter: 719499  total_loss: 3.028  loss_ins: 2.591  loss_cate: 0.4368  time: 0.1018  data_time: 0.0018  lr: 1.25e-05  max_mem: 861M
[05/11 11:27:24 d2.utils.events]:  eta: 0:00:48  iter: 719519  total_loss: 3.028  loss_ins: 2.591  loss_cate: 0.4368  time: 0.1018  data_time: 0.0018  lr: 1.25e-05  max_mem: 861M
[05/11 11:27:26 d2.utils.events]:  eta: 0:00:46  iter: 719539  total_loss: 3.028  loss_ins: 2.591  loss_cate: 0.4368  time: 0.1018  data_time: 0.0018  lr: 1.25e-05  max_mem: 861M
[05/11 11:27:28 d2.utils.events]:  eta: 0:00:44  iter: 719559  total_loss: 3.028  loss_ins: 2.591  loss_cate: 0.4368  time: 0.1018  data_time: 0.0018  lr: 1.25e-05  max_mem: 861M
[05/11 11:27:30 d2.utils.events]:  eta: 0:00:42  iter: 719579  total_loss: 3.028  loss_ins: 2.591  loss_cate: 0.4368  time: 0.1018  data_time: 0.0018  lr: 1.25e-05  max_mem: 861M
[05/11 11:27:32 d2.utils.events]:  eta: 0:00:40  iter: 719599  total_loss: 3.028  loss_ins: 2.591  loss_cate: 0.4368  time: 0.1018  data_time: 0.0018  lr: 1.25e-05  max_mem: 861M
[05/11 11:27:34 d2.utils.events]:  eta: 0:00:38  iter: 719619  total_loss: 3.028  loss_ins: 2.591  loss_cate: 0.4368  time: 0.1018  data_time: 0.0018  lr: 1.25e-05  max_mem: 861M
[05/11 11:27:36 d2.utils.events]:  eta: 0:00:36  iter: 719639  total_loss: 3.028  loss_ins: 2.591  loss_cate: 0.4368  time: 0.1018  data_time: 0.0018  lr: 1.25e-05  max_mem: 861M
[05/11 11:27:38 d2.utils.events]:  eta: 0:00:34  iter: 719659  total_loss: 3.028  loss_ins: 2.591  loss_cate: 0.4368  time: 0.1018  data_time: 0.0018  lr: 1.25e-05  max_mem: 861M
[05/11 11:27:40 d2.utils.events]:  eta: 0:00:32  iter: 719679  total_loss: 3.028  loss_ins: 2.591  loss_cate: 0.4368  time: 0.1018  data_time: 0.0018  lr: 1.25e-05  max_mem: 861M
[05/11 11:27:42 d2.utils.events]:  eta: 0:00:30  iter: 719699  total_loss: 3.028  loss_ins: 2.591  loss_cate: 0.4368  time: 0.1018  data_time: 0.0019  lr: 1.25e-05  max_mem: 861M
[05/11 11:27:44 d2.utils.events]:  eta: 0:00:28  iter: 719719  total_loss: 3.028  loss_ins: 2.591  loss_cate: 0.4368  time: 0.1018  data_time: 0.0018  lr: 1.25e-05  max_mem: 861M
[05/11 11:27:46 d2.utils.events]:  eta: 0:00:26  iter: 719739  total_loss: 3.028  loss_ins: 2.591  loss_cate: 0.4368  time: 0.1018  data_time: 0.0018  lr: 1.25e-05  max_mem: 861M
[05/11 11:27:48 d2.utils.events]:  eta: 0:00:24  iter: 719759  total_loss: 3.028  loss_ins: 2.591  loss_cate: 0.4368  time: 0.1018  data_time: 0.0018  lr: 1.25e-05  max_mem: 861M
[05/11 11:27:50 d2.utils.events]:  eta: 0:00:22  iter: 719779  total_loss: 3.028  loss_ins: 2.591  loss_cate: 0.4368  time: 0.1018  data_time: 0.0018  lr: 1.25e-05  max_mem: 861M
[05/11 11:27:52 d2.utils.events]:  eta: 0:00:20  iter: 719799  total_loss: 3.028  loss_ins: 2.591  loss_cate: 0.4368  time: 0.1018  data_time: 0.0018  lr: 1.25e-05  max_mem: 861M
[05/11 11:27:54 d2.utils.events]:  eta: 0:00:18  iter: 719819  total_loss: 3.028  loss_ins: 2.591  loss_cate: 0.4368  time: 0.1018  data_time: 0.0018  lr: 1.25e-05  max_mem: 861M
[05/11 11:27:56 d2.utils.events]:  eta: 0:00:16  iter: 719839  total_loss: 3.028  loss_ins: 2.591  loss_cate: 0.4368  time: 0.1018  data_time: 0.0018  lr: 1.25e-05  max_mem: 861M
[05/11 11:27:58 d2.utils.events]:  eta: 0:00:14  iter: 719859  total_loss: 3.028  loss_ins: 2.591  loss_cate: 0.4368  time: 0.1018  data_time: 0.0018  lr: 1.25e-05  max_mem: 861M
[05/11 11:28:00 d2.utils.events]:  eta: 0:00:12  iter: 719879  total_loss: 3.028  loss_ins: 2.591  loss_cate: 0.4368  time: 0.1018  data_time: 0.0018  lr: 1.25e-05  max_mem: 861M
[05/11 11:28:02 d2.utils.events]:  eta: 0:00:10  iter: 719899  total_loss: 3.028  loss_ins: 2.591  loss_cate: 0.4368  time: 0.1018  data_time: 0.0018  lr: 1.25e-05  max_mem: 861M
[05/11 11:28:04 d2.utils.events]:  eta: 0:00:08  iter: 719919  total_loss: 3.028  loss_ins: 2.591  loss_cate: 0.4368  time: 0.1018  data_time: 0.0018  lr: 1.25e-05  max_mem: 861M
[05/11 11:28:06 d2.utils.events]:  eta: 0:00:06  iter: 719939  total_loss: 3.028  loss_ins: 2.591  loss_cate: 0.4368  time: 0.1018  data_time: 0.0018  lr: 1.25e-05  max_mem: 861M
[05/11 11:28:08 d2.utils.events]:  eta: 0:00:04  iter: 719959  total_loss: 3.028  loss_ins: 2.591  loss_cate: 0.4368  time: 0.1018  data_time: 0.0020  lr: 1.25e-05  max_mem: 861M
[05/11 11:28:10 d2.utils.events]:  eta: 0:00:02  iter: 719979  total_loss: 3.028  loss_ins: 2.591  loss_cate: 0.4368  time: 0.1018  data_time: 0.0018  lr: 1.25e-05  max_mem: 861M

The loss is not going down.

And when I run the demo.py on the image it's not detect any object. Can you help me please?

May 11 '22 09:05 sdimantsd

@sdimantsd Can u please provide your machine config? I think this is gradient bloom. Clearly it won't learn anything.

May 11 '22 13:05 lucasjinreal

If I change the line in solov2.py from: im = visualize_det_cv2_part(im, None, clss, bboxes, is_show=True) to im = visualize_det_cv2_part(im, None, clss, bboxes, is_show=False) (changed to: is_show=False) it's works. but I think all of it:

      for a in batched_inputs:
           img = a["image"].cpu().permute(1, 2, 0).numpy().astype(np.uint8)
           ins = a['instances']
           bboxes = ins.gt_boxes.tensor.cpu().numpy().astype(int)
           clss = ins.gt_classes.cpu().numpy()
           im = img.copy()
           bit_masks = ins.gt_masks.tensor.cpu().numpy()
           print(bit_masks.shape)
           # img = vis_bitmasks_with_classes(img, clss, bit_masks)
           im = vis_bitmasks(im, bit_masks)
           im = visualize_det_cv2_part(im, None, clss, bboxes, is_show=False)

and this:

        print(images.image_sizes)
        print(images.tensor.shape)

is unnecessary during the training

But when i close the visualization, while training the loss will get Nan. In the process of visualization, the displayed image and mask are all right.

Aug 08 '22 09:08 visionKinger

f"Loss became infinite or NaN at iteration={storage.iter}!\n"

FloatingPointError: Loss became infinite or NaN at iteration=1! loss_dict = {'loss_ins': 2.8118667602539062, 'loss_cate': nan}

After doing the step mentioned in this thread (changing True to False). I'm also getting the same error. If anyone was able to solve it @sdimantsd @visionKinger, Could you please help. Thanks !!

Aug 19 '22 11:08 Sahil028

Please try narrow down the lr.

Aug 19 '22 13:08 lucasjinreal

Hi @jinfagang , Thanks for your answer, I have been experimenting with the solov2_lite and your suggestions have been really helpful. I was able to run it for roughly 22 epochs and here is a bit of training logs for it

[09/10 17:34:31 d2.utils.events]: eta: 0:13:22 iter: 6679 total_loss: 3.368 loss_ins: 2.948 loss_cate: 0.4127 time: 1.6192 data_time: 0.0553 lr: 1.25e-09 max_mem: 9312M [09/10 17:35:03 d2.utils.events]: eta: 0:12:51 iter: 6699 total_loss: 3.385 loss_ins: 2.951 loss_cate: 0.4373 time: 1.6192 data_time: 0.0565 lr: 1.25e-09 max_mem: 9312M [09/10 17:35:36 d2.utils.events]: eta: 0:12:20 iter: 6719 total_loss: 3.346 loss_ins: 2.945 loss_cate: 0.4056 time: 1.6192 data_time: 0.0437 lr: 1.25e-09 max_mem: 9312M [09/10 17:36:09 d2.utils.events]: eta: 0:11:50 iter: 6739 total_loss: 3.374 loss_ins: 2.945 loss_cate: 0.428 time: 1.6194 data_time: 0.0484 lr: 1.25e-09 max_mem: 9312M [09/10 17:36:42 d2.utils.events]: eta: 0:11:19 iter: 6759 total_loss: 3.361 loss_ins: 2.945 loss_cate: 0.416 time: 1.6194 data_time: 0.0525 lr: 1.25e-09 max_mem: 9312M [09/10 17:37:14 d2.utils.events]: eta: 0:10:48 iter: 6779 total_loss: 3.382 loss_ins: 2.95 loss_cate: 0.4243 time: 1.6193 data_time: 0.0508 lr: 1.25e-09 max_mem: 9312M [09/10 17:37:46 d2.utils.events]: eta: 0:10:16 iter: 6799 total_loss: 3.348 loss_ins: 2.948 loss_cate: 0.4043 time: 1.6193 data_time: 0.0551 lr: 1.25e-09 max_mem: 9312M [09/10 17:38:19 d2.utils.events]: eta: 0:09:45 iter: 6819 total_loss: 3.369 loss_ins: 2.946 loss_cate: 0.4279 time: 1.6193 data_time: 0.0545 lr: 1.25e-09 max_mem: 9312M [09/10 17:38:52 d2.utils.events]: eta: 0:09:14 iter: 6839 total_loss: 3.379 loss_ins: 2.94 loss_cate: 0.4413 time: 1.6195 data_time: 0.0488 lr: 1.25e-09 max_mem: 9312M [09/10 17:39:23 d2.utils.events]: eta: 0:08:43 iter: 6859 total_loss: 3.367 loss_ins: 2.944 loss_cate: 0.4301 time: 1.6192 data_time: 0.0518 lr: 1.25e-09 max_mem: 9312M [09/10 17:39:55 d2.utils.events]: eta: 0:08:13 iter: 6879 total_loss: 3.366 loss_ins: 2.945 loss_cate: 0.4096 time: 1.6193 data_time: 0.0510 lr: 1.25e-09 max_mem: 9312M [09/10 17:40:26 d2.utils.events]: eta: 0:07:42 iter: 6899 total_loss: 3.372 loss_ins: 2.946 loss_cate: 0.4224 time: 1.6190 data_time: 0.0594 lr: 1.25e-09 max_mem: 9312M [09/10 17:40:58 d2.utils.events]: eta: 0:07:11 iter: 6919 total_loss: 3.374 loss_ins: 2.944 loss_cate: 0.4191 time: 1.6189 data_time: 0.0555 lr: 1.25e-09 max_mem: 9312M [09/10 17:41:30 d2.utils.events]: eta: 0:06:40 iter: 6939 total_loss: 3.394 loss_ins: 2.95 loss_cate: 0.4397 time: 1.6189 data_time: 0.0476 lr: 1.25e-09 max_mem: 9312M [09/10 17:42:02 d2.utils.events]: eta: 0:06:09 iter: 6959 total_loss: 3.368 loss_ins: 2.948 loss_cate: 0.4201 time: 1.6189 data_time: 0.0553 lr: 1.25e-09 max_mem: 9312M [09/10 17:42:37 d2.utils.events]: eta: 0:05:38 iter: 6979 total_loss: 3.375 loss_ins: 2.947 loss_cate: 0.4291 time: 1.6191 data_time: 0.0514 lr: 1.25e-09 max_mem: 9312M [09/10 17:43:10 d2.utils.events]: eta: 0:05:08 iter: 6999 total_loss: 3.362 loss_ins: 2.948 loss_cate: 0.4324 time: 1.6194 data_time: 0.0621 lr: 1.25e-09 max_mem: 9312M [09/10 17:43:44 d2.utils.events]: eta: 0:04:37 iter: 7019 total_loss: 3.407 loss_ins: 2.947 loss_cate: 0.4632 time: 1.6195 data_time: 0.0589 lr: 1.25e-09 max_mem: 9312M [09/10 17:44:18 d2.utils.events]: eta: 0:04:06 iter: 7039 total_loss: 3.375 loss_ins: 2.948 loss_cate: 0.4275 time: 1.6197 data_time: 0.0561 lr: 1.25e-09 max_mem: 9312M [09/10 17:44:50 d2.utils.events]: eta: 0:03:36 iter: 7059 total_loss: 3.372 loss_ins: 2.945 loss_cate: 0.4346 time: 1.6197 data_time: 0.0505 lr: 1.25e-09 max_mem: 9312M [09/10 17:45:22 d2.utils.events]: eta: 0:03:05 iter: 7079 total_loss: 3.376 loss_ins: 2.948 loss_cate: 0.4245 time: 1.6197 data_time: 0.0525 lr: 1.25e-09 max_mem: 9312M [09/10 17:45:54 d2.utils.events]: eta: 0:02:34 iter: 7099 total_loss: 3.354 loss_ins: 2.943 loss_cate: 0.4098 time: 1.6196 data_time: 0.0537 lr: 1.25e-09 max_mem: 9312M [09/10 17:46:28 d2.utils.events]: eta: 0:02:03 iter: 7119 total_loss: 3.386 loss_ins: 2.949 loss_cate: 0.4338 time: 1.6198 data_time: 0.0537 lr: 1.25e-09 max_mem: 9312M [09/10 17:47:00 d2.utils.events]: eta: 0:01:32 iter: 7139 total_loss: 3.362 loss_ins: 2.947 loss_cate: 0.4166 time: 1.6197 data_time: 0.0488 lr: 1.25e-09 max_mem: 9312M [09/10 17:47:33 d2.utils.events]: eta: 0:01:01 iter: 7159 total_loss: 3.36 loss_ins: 2.946 loss_cate: 0.4138 time: 1.6198 data_time: 0.0530 lr: 1.25e-09 max_mem: 9312M ` and logs for the validation set

`[09/10 17:50:53 d2.evaluation.coco_evaluation]: Evaluation results for segm:

AP	AP50	AP75	APs	APm	APl
0.000	0.000	0.000	0.000	0.000	0.000
[09/10 17:50:53 d2.evaluation.coco_evaluation]: Per-category segm AP:
category	AP	category	AP	category	AP
:-----------	:------	:-----------	:------	:-----------	:------
circle	0.000	poly	0.000	line	0.000
parabola	0.000
[09/10 17:50:53 d2.engine.defaults]: Evaluation results for val in csv format:
[09/10 17:50:53 d2.evaluation.testing]: copypaste: Task: bbox
[09/10 17:50:53 d2.evaluation.testing]: copypaste: AP,AP50,AP75,APs,APm,APl
[09/10 17:50:53 d2.evaluation.testing]: copypaste: 0.0356,0.2245,0.0039,0.6089,1.0985,0.0605
[09/10 17:50:53 d2.evaluation.testing]: copypaste: Task: segm
[09/10 17:50:53 d2.evaluation.testing]: copypaste: AP,AP50,AP75,APs,APm,APl
[09/10 17:50:53 d2.evaluation.testing]: copypaste: 0.0000,0.0000,0.0000,0.0000,0.0000,0.0000`

Could you please suggest where I should look at or the reason why it's not giving any result. Thanks !!

Sep 12 '22 02:09 Sahil028

you should register custom dataset and train with custom data script.

Sep 12 '22 03:09 lucasjinreal

Yes, did that already, only then I was able to train, here's how I registered the data

#registering the data set register_coco_instances("train", {},"/content/yolov7_d2/dataset/math_data/train/train.json", "/content/yolov7_d2/dataset/math_data") register_coco_instances("val", {}, "/content/yolov7_d2/dataset/math_data/val/val.json", "/content/yolov7_d2/dataset/math_data")

and did the changes accordingly in the train_inseg file, added these dew lines at the start in train_inseg file

from detectron2.data.datasets.coco import load_coco_json, register_coco_instances from train_det import Trainer, setup

def register_custom_datasets(): # facemask dataset DATASET_ROOT = "./dataset/math_data" ANN_ROOT = DATASET_ROOT TRAIN_PATH = os.path.join(ANN_ROOT, "train") VAL_PATH = os.path.join(ANN_ROOT, "val") TRAIN_JSON = os.path.join(TRAIN_PATH, "train.json") VAL_JSON = os.path.join(VAL_PATH, "val.json") register_coco_instances("train", {}, TRAIN_JSON, TRAIN_PATH) register_coco_instances("val", {}, VAL_JSON, VAL_PATH)

register_custom_datasets()

Thanks for the quick response, could you please suggest anything else that I should look at

Sep 12 '22 03:09 Sahil028

can u try train coco first? for tiny dataset I think the lr is very hard to adjust. you can join our discord for further guidance.

Sep 12 '22 05:09 lucasjinreal

Sure, will try and update. Thanks !!!

Sep 12 '22 06:09 Sahil028

yolov7_d2 yolov7_d2 copied to clipboard

Cann't train with solov2 config

yolov7_d2
yolov7_d2 copied to clipboard