unilm icon indicating copy to clipboard operation
unilm copied to clipboard

LayoutLMv3 | Index Error While Training On Custom Dataset

Open jordanparker6 opened this issue 2 years ago • 1 comments

Describe the bug LayoutLMv3

The problem arises when using:

  • [X] the official example scripts: (give details below)
  • [x] my own modified scripts: (give details below)

LayoutLMv3 is running into an issue while training. I am using the provided scripts but I have altered the config to point to a custom COCO dataset.

	python3 layoutlmv3/object_detection/train.py \
		--config layoutlmv3/object_detection/cascade_layoutlmv3.yaml \
		--num-gpus 0 \
		MODEL.WEIGHTS ./models/layoutlmv3-base/pytorch_model.bin \
		OUTPUT_DIR ./training/output \
		PUBLAYNET_DATA_DIR_TRAIN ./training/train/  \
		PUBLAYNET_DATA_DIR_TEST ./training/val/  \
		SOLVER.IMS_PER_BATCH 2 \
                SOLVER.BASE_LR 0.0025 \
                MODEL.DEVICE cpu

Error message below.

My hunch is it is connected to the configuration of the shape of the fully connected layer but I though Detectron2 parameterised that on startup.

Traceback (most recent call last):
  File "contractron/models/layoutlmv3/object_detection/train.py", line 124, in <module>
    args=(args,),
  File "/opt/conda/lib/python3.7/site-packages/detectron2/engine/launch.py", line 82, in launch
    main_func(*args)
  File "contractron/models/layoutlmv3/object_detection/train.py", line 97, in main
    return trainer.train()
  File "/home/jupyter/contractron/contractron/models/layoutlmv3/object_detection/ditod/mytrainer.py", line 495, in train
    super().train(self.start_iter, self.max_iter)
  File "/opt/conda/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 149, in train
    self.run_step()
  File "/home/jupyter/contractron/contractron/models/layoutlmv3/object_detection/ditod/mytrainer.py", line 506, in run_step
    self._trainer.run_step()
  File "/opt/conda/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 413, in run_step
    loss_dict = self.model(data)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jupyter/contractron/contractron/models/layoutlmv3/object_detection/ditod/rcnn_vl.py", line 74, in forward
    _, detector_losses = self.roi_heads(images, features, proposals, gt_instances)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/detectron2/modeling/roi_heads/cascade_rcnn.py", line 144, in forward
    losses = self._forward_box(features, proposals, targets)
  File "/opt/conda/lib/python3.7/site-packages/detectron2/modeling/roi_heads/cascade_rcnn.py", line 183, in _forward_box
    stage_losses = predictor.losses(predictions, proposals)
  File "/opt/conda/lib/python3.7/site-packages/detectron2/modeling/roi_heads/fast_rcnn.py", line 344, in losses
    loss_cls = cross_entropy(scores, gt_classes, reduction="mean")
  File "/opt/conda/lib/python3.7/site-packages/detectron2/layers/wrappers.py", line 56, in wrapped_loss_func
    return loss_func(input, target, reduction=reduction, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/functional.py", line 3014, in cross_entropy
    return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
IndexError: Target 9 is out of bounds.
make: *** [Makefile:41: train-layoutlmv3-object-detection-cpu] Error 1

Full output bellow

python3 contractron/models/layoutlmv3/object_detection/train.py \
        --config ./contractron/models/layoutlmv3/object_detection/cascade_layoutlmv3.yaml \
        --num-gpus 0 \
        MODEL.WEIGHTS ./models/layoutlmv3-base/pytorch_model.bin \
        OUTPUT_DIR ./training/output \
        PUBLAYNET_DATA_DIR_TRAIN ./training/train/ \
        PUBLAYNET_DATA_DIR_TEST ./training/val/ \
        SOLVER.IMS_PER_BATCH 2 \
        SOLVER.BASE_LR 0.0025 \
        MODEL.DEVICE cpu
Command Line Args: Namespace(config_file='./contractron/models/layoutlmv3/object_detection/cascade_layoutlmv3.yaml', debug=False, dist_url='tcp://127.0.0.1:50152', eval_only=False, machine_rank=0, num_gpus=0, num_machines=1, opts=['MODEL.WEIGHTS', './models/layoutlmv3-base/pytorch_model.bin', 'OUTPUT_DIR', './training/output', 'PUBLAYNET_DATA_DIR_TRAIN', './training/data/', 'PUBLAYNET_DATA_DIR_TEST', './training/data/', 'SOLVER.IMS_PER_BATCH', '2', 'SOLVER.BASE_LR', '0.0025', 'MODEL.DEVICE', 'cpu'], resume=False)
[08/08 12:03:16 detectron2]: Rank of current process: 0. World size: 1
[08/08 12:03:19 detectron2]: Environment info:
----------------------  ---------------------------------------------------------------------------------------------------
sys.platform            linux
Python                  3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:53) [GCC 9.4.0]
numpy                   1.19.5
detectron2              0.6 @/opt/conda/lib/python3.7/site-packages/detectron2
detectron2._C           not built correctly: libtorch_cuda_cu.so: cannot open shared object file: No such file or directory
Compiler ($CXX)         c++ (Debian 8.3.0-6) 8.3.0
CUDA compiler           Build cuda_11.3.r11.3/compiler.29920130_0
detectron2 arch flags   7.5
DETECTRON2_ENV_MODULE   <not set>
PyTorch                 1.12.1+cu102 @/opt/conda/lib/python3.7/site-packages/torch
PyTorch debug build     False
GPU available           Yes
GPU 0,1,2,3             Tesla T4 (arch=7.5)
Driver version          470.57.02
CUDA_HOME               /usr/local/cuda
Pillow                  9.1.1
torchvision             0.13.1+cu102 @/opt/conda/lib/python3.7/site-packages/torchvision
torchvision arch flags  3.5, 5.0, 6.0, 7.0, 7.5
fvcore                  0.1.5.post20220512
iopath                  0.1.9
cv2                     Not found
----------------------  ---------------------------------------------------------------------------------------------------
PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2022.1-Product Build 20220311 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 10.2
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70
  - CuDNN 7.6.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=7.6.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

[08/08 12:03:19 detectron2]: Command line arguments: Namespace(config_file='./contractron/models/layoutlmv3/object_detection/cascade_layoutlmv3.yaml', debug=False, dist_url='tcp://127.0.0.1:50152', eval_only=False, machine_rank=0, num_gpus=0, num_machines=1, opts=['MODEL.WEIGHTS', './models/layoutlmv3-base/pytorch_model.bin', 'OUTPUT_DIR', './training/output', 'PUBLAYNET_DATA_DIR_TRAIN', './training/data/', 'PUBLAYNET_DATA_DIR_TEST', './training/data/', 'SOLVER.IMS_PER_BATCH', '2', 'SOLVER.BASE_LR', '0.0025', 'MODEL.DEVICE', 'cpu'], resume=False)
[08/08 12:03:19 detectron2]: Contents of args.config_file=./contractron/models/layoutlmv3/object_detection/cascade_layoutlmv3.yaml:
MODEL:
  MASK_ON: True
  IMAGE_ONLY: True
  META_ARCHITECTURE: "VLGeneralizedRCNN"
  PIXEL_MEAN: [ 127.5, 127.5, 127.5 ]
  PIXEL_STD: [ 127.5, 127.5, 127.5 ]
  WEIGHTS: "/Users/jordanparker/Programs/contractron/external/layoutlmv3-base-finetuned-publaynet/model_final.pth"
  BACKBONE:
    NAME: "build_vit_fpn_backbone"
  VIT:
    NAME: "layoutlmv3_base"
    OUT_FEATURES: [ "layer3", "layer5", "layer7", "layer11" ]
    DROP_PATH: 0.1
    IMG_SIZE: [ 224,224 ]
    POS_TYPE: "abs"
  ROI_HEADS:
    NAME: CascadeROIHeads
    IN_FEATURES: [ "p2", "p3", "p4", "p5" ]
    NUM_CLASSES: 5
  ROI_BOX_HEAD:
    CLS_AGNOSTIC_BBOX_REG: True
    NAME: "FastRCNNConvFCHead"
    NUM_FC: 2
    POOLER_RESOLUTION: 7
  ROI_MASK_HEAD:
    NAME: "MaskRCNNConvUpsampleHead"
    NUM_CONV: 4
    POOLER_RESOLUTION: 14
  FPN:
    IN_FEATURES: [ "layer3", "layer5", "layer7", "layer11" ]
  ANCHOR_GENERATOR:
    SIZES: [ [ 32 ], [ 64 ], [ 128 ], [ 256 ], [ 512 ] ]  # One size for each in feature map
    ASPECT_RATIOS: [ [ 0.5, 1.0, 2.0 ] ]  # Three aspect ratios (same for all in feature maps)
  RPN:
    IN_FEATURES: [ "p2", "p3", "p4", "p5", "p6" ]
    PRE_NMS_TOPK_TRAIN: 2000  # Per FPN level
    PRE_NMS_TOPK_TEST: 1000  # Per FPN level
    # Detectron1 uses 2000 proposals per-batch,
    # (See "modeling/rpn/rpn_outputs.py" for details of this legacy issue)
    # which is approximately 1000 proposals per-image since the default batch size for FPN is 2.
    POST_NMS_TOPK_TRAIN: 2000
    POST_NMS_TOPK_TEST: 1000
DATASETS:
  TRAIN: ("train",)
  TEST: ("val",)
SOLVER:
  GRADIENT_ACCUMULATION_STEPS: 1
  BASE_LR: 0.0002
  WARMUP_ITERS: 1000
  IMS_PER_BATCH: 32
  MAX_ITER: 60000
  CHECKPOINT_PERIOD: 2000
  LR_SCHEDULER_NAME: "WarmupCosineLR"
  AMP:
    ENABLED: True
  OPTIMIZER: "ADAMW"
  BACKBONE_MULTIPLIER: 1.0
  CLIP_GRADIENTS:
    ENABLED: True
    CLIP_TYPE: "full_model"
    CLIP_VALUE: 1.0
    NORM_TYPE: 2.0
  WARMUP_FACTOR: 0.01
  WEIGHT_DECAY: 0.05
TEST:
  EVAL_PERIOD: 2000
INPUT:
  CROP:
    ENABLED: True
    TYPE: "absolute_range"
    SIZE: (384, 600)
  MIN_SIZE_TRAIN: (480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800)
  FORMAT: "RGB"
DATALOADER:
  FILTER_EMPTY_ANNOTATIONS: False
VERSION: 2
AUG:
  DETR: True
SEED: 42
OUTPUT_DIR: "/Users/jordanparker/Programs/contractron/training/output"
PUBLAYNET_DATA_DIR_TRAIN: "/Users/jordanparker/Programs/contractron/training/train"
PUBLAYNET_DATA_DIR_TEST: "/Users/jordanparker/Programs/contractron/training/val"
CACHE_DIR: "~/.cache/huggingface/datasets"

[08/08 12:03:19 detectron2]: Running with full config:
AUG:
  DETR: true
CACHE_DIR: ~/.cache/huggingface/datasets
CUDNN_BENCHMARK: false
DATALOADER:
  ASPECT_RATIO_GROUPING: true
  FILTER_EMPTY_ANNOTATIONS: false
  NUM_WORKERS: 4
  REPEAT_THRESHOLD: 0.0
  SAMPLER_TRAIN: TrainingSampler
DATASETS:
  PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000
  PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000
  PROPOSAL_FILES_TEST: []
  PROPOSAL_FILES_TRAIN: []
  TEST:
  - val
  TRAIN:
  - train
GLOBAL:
  HACK: 1.0
ICDAR_DATA_DIR_TEST: ''
ICDAR_DATA_DIR_TRAIN: ''
INPUT:
  CROP:
    ENABLED: true
    SIZE:
    - 384
    - 600
    TYPE: absolute_range
  FORMAT: RGB
  MASK_FORMAT: polygon
  MAX_SIZE_TEST: 1333
  MAX_SIZE_TRAIN: 1333
  MIN_SIZE_TEST: 800
  MIN_SIZE_TRAIN:
  - 480
  - 512
  - 544
  - 576
  - 608
  - 640
  - 672
  - 704
  - 736
  - 768
  - 800
  MIN_SIZE_TRAIN_SAMPLING: choice
  RANDOM_FLIP: horizontal
MODEL:
  ANCHOR_GENERATOR:
    ANGLES:
    - - -90
      - 0
      - 90
    ASPECT_RATIOS:
    - - 0.5
      - 1.0
      - 2.0
    NAME: DefaultAnchorGenerator
    OFFSET: 0.0
    SIZES:
    - - 32
    - - 64
    - - 128
    - - 256
    - - 512
  BACKBONE:
    FREEZE_AT: 2
    NAME: build_vit_fpn_backbone
  CONFIG_PATH: ''
  DEVICE: cpu
  FPN:
    FUSE_TYPE: sum
    IN_FEATURES:
    - layer3
    - layer5
    - layer7
    - layer11
    NORM: ''
    OUT_CHANNELS: 256
  IMAGE_ONLY: true
  KEYPOINT_ON: false
  LOAD_PROPOSALS: false
  MASK_ON: true
  META_ARCHITECTURE: VLGeneralizedRCNN
  PANOPTIC_FPN:
    COMBINE:
      ENABLED: true
      INSTANCES_CONFIDENCE_THRESH: 0.5
      OVERLAP_THRESH: 0.5
      STUFF_AREA_LIMIT: 4096
    INSTANCE_LOSS_WEIGHT: 1.0
  PIXEL_MEAN:
  - 127.5
  - 127.5
  - 127.5
  PIXEL_STD:
  - 127.5
  - 127.5
  - 127.5
  PROPOSAL_GENERATOR:
    MIN_SIZE: 0
    NAME: RPN
  RESNETS:
    DEFORM_MODULATED: false
    DEFORM_NUM_GROUPS: 1
    DEFORM_ON_PER_STAGE:
    - false
    - false
    - false
    - false
    DEPTH: 50
    NORM: FrozenBN
    NUM_GROUPS: 1
    OUT_FEATURES:
    - res4
    RES2_OUT_CHANNELS: 256
    RES5_DILATION: 1
    STEM_OUT_CHANNELS: 64
    STRIDE_IN_1X1: true
    WIDTH_PER_GROUP: 64
  RETINANET:
    BBOX_REG_LOSS_TYPE: smooth_l1
    BBOX_REG_WEIGHTS: &id001
    - 1.0
    - 1.0
    - 1.0
    - 1.0
    FOCAL_LOSS_ALPHA: 0.25
    FOCAL_LOSS_GAMMA: 2.0
    IN_FEATURES:
    - p3
    - p4
    - p5
    - p6
    - p7
    IOU_LABELS:
    - 0
    - -1
    - 1
    IOU_THRESHOLDS:
    - 0.4
    - 0.5
    NMS_THRESH_TEST: 0.5
    NORM: ''
    NUM_CLASSES: 80
    NUM_CONVS: 4
    PRIOR_PROB: 0.01
    SCORE_THRESH_TEST: 0.05
    SMOOTH_L1_LOSS_BETA: 0.1
    TOPK_CANDIDATES_TEST: 1000
  ROI_BOX_CASCADE_HEAD:
    BBOX_REG_WEIGHTS:
    - - 10.0
      - 10.0
      - 5.0
      - 5.0
    - - 20.0
      - 20.0
      - 10.0
      - 10.0
    - - 30.0
      - 30.0
      - 15.0
      - 15.0
    IOUS:
    - 0.5
    - 0.6
    - 0.7
  ROI_BOX_HEAD:
    BBOX_REG_LOSS_TYPE: smooth_l1
    BBOX_REG_LOSS_WEIGHT: 1.0
    BBOX_REG_WEIGHTS:
    - 10.0
    - 10.0
    - 5.0
    - 5.0
    CLS_AGNOSTIC_BBOX_REG: true
    CONV_DIM: 256
    FC_DIM: 1024
    FED_LOSS_FREQ_WEIGHT_POWER: 0.5
    FED_LOSS_NUM_CLASSES: 50
    NAME: FastRCNNConvFCHead
    NORM: ''
    NUM_CONV: 0
    NUM_FC: 2
    POOLER_RESOLUTION: 7
    POOLER_SAMPLING_RATIO: 0
    POOLER_TYPE: ROIAlignV2
    SMOOTH_L1_BETA: 0.0
    TRAIN_ON_PRED_BOXES: false
    USE_FED_LOSS: false
    USE_SIGMOID_CE: false
  ROI_HEADS:
    BATCH_SIZE_PER_IMAGE: 512
    IN_FEATURES:
    - p2
    - p3
    - p4
    - p5
    IOU_LABELS:
    - 0
    - 1
    IOU_THRESHOLDS:
    - 0.5
    NAME: CascadeROIHeads
    NMS_THRESH_TEST: 0.5
    NUM_CLASSES: 5
    POSITIVE_FRACTION: 0.25
    PROPOSAL_APPEND_GT: true
    SCORE_THRESH_TEST: 0.05
  ROI_KEYPOINT_HEAD:
    CONV_DIMS:
    - 512
    - 512
    - 512
    - 512
    - 512
    - 512
    - 512
    - 512
    LOSS_WEIGHT: 1.0
    MIN_KEYPOINTS_PER_IMAGE: 1
    NAME: KRCNNConvDeconvUpsampleHead
    NORMALIZE_LOSS_BY_VISIBLE_KEYPOINTS: true
    NUM_KEYPOINTS: 17
    POOLER_RESOLUTION: 14
    POOLER_SAMPLING_RATIO: 0
    POOLER_TYPE: ROIAlignV2
  ROI_MASK_HEAD:
    CLS_AGNOSTIC_MASK: false
    CONV_DIM: 256
    NAME: MaskRCNNConvUpsampleHead
    NORM: ''
    NUM_CONV: 4
    POOLER_RESOLUTION: 14
    POOLER_SAMPLING_RATIO: 0
    POOLER_TYPE: ROIAlignV2
  RPN:
    BATCH_SIZE_PER_IMAGE: 256
    BBOX_REG_LOSS_TYPE: smooth_l1
    BBOX_REG_LOSS_WEIGHT: 1.0
    BBOX_REG_WEIGHTS: *id001
    BOUNDARY_THRESH: -1
    CONV_DIMS:
    - -1
    HEAD_NAME: StandardRPNHead
    IN_FEATURES:
    - p2
    - p3
    - p4
    - p5
    - p6
    IOU_LABELS:
    - 0
    - -1
    - 1
    IOU_THRESHOLDS:
    - 0.3
    - 0.7
    LOSS_WEIGHT: 1.0
    NMS_THRESH: 0.7
    POSITIVE_FRACTION: 0.5
    POST_NMS_TOPK_TEST: 1000
    POST_NMS_TOPK_TRAIN: 2000
    PRE_NMS_TOPK_TEST: 1000
    PRE_NMS_TOPK_TRAIN: 2000
    SMOOTH_L1_BETA: 0.0
  SEM_SEG_HEAD:
    COMMON_STRIDE: 4
    CONVS_DIM: 128
    IGNORE_VALUE: 255
    IN_FEATURES:
    - p2
    - p3
    - p4
    - p5
    LOSS_WEIGHT: 1.0
    NAME: SemSegFPNHead
    NORM: GN
    NUM_CLASSES: 54
  VIT:
    DROP_PATH: 0.1
    IMG_SIZE:
    - 224
    - 224
    MODEL_KWARGS: '{}'
    NAME: layoutlmv3_base
    OUT_FEATURES:
    - layer3
    - layer5
    - layer7
    - layer11
    POS_TYPE: abs
  WEIGHTS: ./models/layoutlmv3-base/pytorch_model.bin
OUTPUT_DIR: ./training/output
PUBLAYNET_DATA_DIR_TEST: ./training/val/
PUBLAYNET_DATA_DIR_TRAIN: ./training/train/
SEED: 42
SOLVER:
  AMP:
    ENABLED: true
  BACKBONE_MULTIPLIER: 1.0
  BASE_LR: 0.0025
  BASE_LR_END: 0.0
  BIAS_LR_FACTOR: 1.0
  CHECKPOINT_PERIOD: 2000
  CLIP_GRADIENTS:
    CLIP_TYPE: full_model
    CLIP_VALUE: 1.0
    ENABLED: true
    NORM_TYPE: 2.0
  GAMMA: 0.1
  GRADIENT_ACCUMULATION_STEPS: 1
  IMS_PER_BATCH: 2
  LR_SCHEDULER_NAME: WarmupCosineLR
  MAX_ITER: 60000
  MOMENTUM: 0.9
  NESTEROV: false
  OPTIMIZER: ADAMW
  REFERENCE_WORLD_SIZE: 0
  STEPS:
  - 30000
  WARMUP_FACTOR: 0.01
  WARMUP_ITERS: 1000
  WARMUP_METHOD: linear
  WEIGHT_DECAY: 0.05
  WEIGHT_DECAY_BIAS: null
  WEIGHT_DECAY_NORM: 0.0
TEST:
  AUG:
    ENABLED: false
    FLIP: true
    MAX_SIZE: 4000
    MIN_SIZES:
    - 400
    - 500
    - 600
    - 700
    - 800
    - 900
    - 1000
    - 1100
    - 1200
  DETECTIONS_PER_IMAGE: 100
  EVAL_PERIOD: 2000
  EXPECTED_RESULTS: []
  KEYPOINT_OKS_SIGMAS: []
  PRECISE_BN:
    ENABLED: false
    NUM_ITER: 200
VERSION: 2
VIS_PERIOD: 0

[08/08 12:03:19 detectron2]: Full config saved to ./training/output/config.yaml
WARNING [08/08 12:03:21 d2.data.datasets.coco]: 
Category ids in annotations are not in [1, #categories]! We'll apply a mapping for you.

[08/08 12:03:21 d2.data.datasets.coco]: Loaded 922 images in COCO format from ./training/data/metadata.json
[08/08 12:03:21 d2.data.build]: Distribution of instances among all 20 categories:
[08/08 12:03:21 d2.data.build]: Using training sampler TrainingSampler
[08/08 12:03:21 d2.data.common]: Serializing 922 elements to byte tensors and concatenating them all ...
[08/08 12:03:21 d2.data.common]: Serialized dataset takes 0.46 MiB
[08/08 12:03:22 fvcore.common.checkpoint]: [Checkpointer] Loading from ./models/layoutlmv3-base/pytorch_model.bin ...
WARNING [08/08 12:03:22 fvcore.common.checkpoint]: Some model parameters or buffers are not found in the checkpoint:
backbone.bottom_up.backbone.encoder.fpn1.0.{bias, weight}
backbone.bottom_up.backbone.encoder.fpn1.1.{bias, running_mean, running_var, weight}
backbone.bottom_up.backbone.encoder.fpn1.3.{bias, weight}
backbone.bottom_up.backbone.encoder.fpn2.0.{bias, weight}
backbone.fpn_lateral2.{bias, weight}
backbone.fpn_lateral3.{bias, weight}
backbone.fpn_lateral4.{bias, weight}
backbone.fpn_lateral5.{bias, weight}
backbone.fpn_output2.{bias, weight}
backbone.fpn_output3.{bias, weight}
backbone.fpn_output4.{bias, weight}
backbone.fpn_output5.{bias, weight}
proposal_generator.rpn_head.anchor_deltas.{bias, weight}
proposal_generator.rpn_head.conv.{bias, weight}
proposal_generator.rpn_head.objectness_logits.{bias, weight}
roi_heads.box_head.0.fc1.{bias, weight}
roi_heads.box_head.0.fc2.{bias, weight}
roi_heads.box_head.1.fc1.{bias, weight}
roi_heads.box_head.1.fc2.{bias, weight}
roi_heads.box_head.2.fc1.{bias, weight}
roi_heads.box_head.2.fc2.{bias, weight}
roi_heads.box_predictor.0.bbox_pred.{bias, weight}
roi_heads.box_predictor.0.cls_score.{bias, weight}
roi_heads.box_predictor.1.bbox_pred.{bias, weight}
roi_heads.box_predictor.1.cls_score.{bias, weight}
roi_heads.box_predictor.2.bbox_pred.{bias, weight}
roi_heads.box_predictor.2.cls_score.{bias, weight}
roi_heads.mask_head.deconv.{bias, weight}
roi_heads.mask_head.mask_fcn1.{bias, weight}
roi_heads.mask_head.mask_fcn2.{bias, weight}
roi_heads.mask_head.mask_fcn3.{bias, weight}
roi_heads.mask_head.mask_fcn4.{bias, weight}
roi_heads.mask_head.predictor.{bias, weight}
WARNING [08/08 12:03:22 fvcore.common.checkpoint]: The checkpoint state_dict contains keys that are not used by the model:
  backbone.bottom_up.backbone.embeddings.position_ids
  backbone.bottom_up.backbone.embeddings.word_embeddings.weight
  backbone.bottom_up.backbone.embeddings.position_embeddings.weight
  backbone.bottom_up.backbone.embeddings.token_type_embeddings.weight
  backbone.bottom_up.backbone.embeddings.LayerNorm.{bias, weight}
  backbone.bottom_up.backbone.embeddings.x_position_embeddings.weight
  backbone.bottom_up.backbone.embeddings.y_position_embeddings.weight
  backbone.bottom_up.backbone.embeddings.h_position_embeddings.weight
  backbone.bottom_up.backbone.embeddings.w_position_embeddings.weight
  backbone.bottom_up.backbone.encoder.rel_pos_bias.weight
  backbone.bottom_up.backbone.encoder.rel_pos_x_bias.weight
  backbone.bottom_up.backbone.encoder.rel_pos_y_bias.weight
[08/08 12:03:22 d2.engine.train_loop]: Starting training from iteration 0
/opt/conda/lib/python3.7/site-packages/transformers/modeling_utils.py:714: FutureWarning: The `device` argument is deprecated and will be removed in v5 of Transformers.
  "The `device` argument is deprecated and will be removed in v5 of Transformers.", FutureWarning
/opt/conda/lib/python3.7/site-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  ../aten/src/ATen/native/TensorShape.cpp:2895.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
ERROR [08/08 12:04:02 d2.engine.train_loop]: Exception during training:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 149, in train
    self.run_step()
  File "/home/jupyter/contractron/contractron/models/layoutlmv3/object_detection/ditod/mytrainer.py", line 506, in run_step
    self._trainer.run_step()
  File "/opt/conda/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 413, in run_step
    loss_dict = self.model(data)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jupyter/contractron/contractron/models/layoutlmv3/object_detection/ditod/rcnn_vl.py", line 74, in forward
    _, detector_losses = self.roi_heads(images, features, proposals, gt_instances)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/detectron2/modeling/roi_heads/cascade_rcnn.py", line 144, in forward
    losses = self._forward_box(features, proposals, targets)
  File "/opt/conda/lib/python3.7/site-packages/detectron2/modeling/roi_heads/cascade_rcnn.py", line 183, in _forward_box
    stage_losses = predictor.losses(predictions, proposals)
  File "/opt/conda/lib/python3.7/site-packages/detectron2/modeling/roi_heads/fast_rcnn.py", line 344, in losses
    loss_cls = cross_entropy(scores, gt_classes, reduction="mean")
  File "/opt/conda/lib/python3.7/site-packages/detectron2/layers/wrappers.py", line 56, in wrapped_loss_func
    return loss_func(input, target, reduction=reduction, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/functional.py", line 3014, in cross_entropy
    return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
IndexError: Target 9 is out of bounds.
[08/08 12:04:02 d2.engine.hooks]: Total training time: 0:00:40 (0:00:00 on hooks)
[08/08 12:04:02 d2.utils.events]:  iter: 0    lr: N/A  max_mem: 0M
Traceback (most recent call last):
  File "contractron/models/layoutlmv3/object_detection/train.py", line 124, in <module>
    args=(args,),
  File "/opt/conda/lib/python3.7/site-packages/detectron2/engine/launch.py", line 82, in launch
    main_func(*args)
  File "contractron/models/layoutlmv3/object_detection/train.py", line 97, in main
    return trainer.train()
  File "/home/jupyter/contractron/contractron/models/layoutlmv3/object_detection/ditod/mytrainer.py", line 495, in train
    super().train(self.start_iter, self.max_iter)
  File "/opt/conda/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 149, in train
    self.run_step()
  File "/home/jupyter/contractron/contractron/models/layoutlmv3/object_detection/ditod/mytrainer.py", line 506, in run_step
    self._trainer.run_step()
  File "/opt/conda/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 413, in run_step
    loss_dict = self.model(data)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jupyter/contractron/contractron/models/layoutlmv3/object_detection/ditod/rcnn_vl.py", line 74, in forward
    _, detector_losses = self.roi_heads(images, features, proposals, gt_instances)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/detectron2/modeling/roi_heads/cascade_rcnn.py", line 144, in forward
    losses = self._forward_box(features, proposals, targets)
  File "/opt/conda/lib/python3.7/site-packages/detectron2/modeling/roi_heads/cascade_rcnn.py", line 183, in _forward_box
    stage_losses = predictor.losses(predictions, proposals)
  File "/opt/conda/lib/python3.7/site-packages/detectron2/modeling/roi_heads/fast_rcnn.py", line 344, in losses
    loss_cls = cross_entropy(scores, gt_classes, reduction="mean")
  File "/opt/conda/lib/python3.7/site-packages/detectron2/layers/wrappers.py", line 56, in wrapped_loss_func
    return loss_func(input, target, reduction=reduction, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/functional.py", line 3014, in cross_entropy
    return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
IndexError: Target 9 is out of bounds.

jordanparker6 avatar Aug 08 '22 12:08 jordanparker6

Hi, has your problem been solved? Have you run the example code to see if training the model on PubLayNet with GPUs works? I have not tried training with CPU on a custom dataset and have not encountered this problem. I suppose this is an indexing issue related to your custom dataset from the error message. I hope these answers (e.g., 1, 2, 3) are helpful.

HYPJUDY avatar Sep 03 '22 06:09 HYPJUDY

I am closing this issue for now, since it is inactive.

HYPJUDY avatar Nov 03 '22 15:11 HYPJUDY