nanodet icon indicating copy to clipboard operation
nanodet copied to clipboard

After 10 epoch error

Open TD-wzw opened this issue 4 years ago • 26 comments

Loading and preparing results... Traceback (most recent call last): File "tools/train.py", line 92, in main(args) File "tools/train.py", line 87, in main trainer.run(train_dataloader, val_dataloader, evaluator) File "/disk_2t_02/wangzhiwei/nanodet/nanodet/trainer/trainer.py", line 143, in run eval_results = evaluator.evaluate(results, self.cfg.save_dir, epoch, self.logger, rank=self.rank) File "/disk_2t_02/wangzhiwei/nanodet/nanodet/evaluator/coco_detection.py", line 55, in evaluate coco_dets = self.coco_api.loadRes(json_path) File "/home/lhw/anaconda3/envs/pytorch1.6/lib/python3.8/site-packages/pycocotools/coco.py", line 328, in loadRes if 'caption' in anns[0]: IndexError: list index out of range

TD-wzw avatar Jan 20 '21 04:01 TD-wzw

It seems that the results.json file is empty. Can you upload your results.json for me to find out what's going wrong?

RangiLyu avatar Jan 20 '21 05:01 RangiLyu

results-1.txt

TD-wzw avatar Jan 20 '21 06:01 TD-wzw

Why is it empty

TD-wzw avatar Jan 20 '21 06:01 TD-wzw

It's weird. Maybe because the model detected nothing in the Val dataset.

RangiLyu avatar Jan 21 '21 02:01 RangiLyu

9/5000 Thank you. Let me check again

TD-wzw avatar Jan 21 '21 03:01 TD-wzw

I also encountered the same problem, did you solve it?

ztt0810 avatar Jan 25 '21 06:01 ztt0810

Take a look at your config file

TD-wzw avatar Jan 25 '21 07:01 TD-wzw

#Config File example
save_dir: workspace/nanodet_m
model:
  arch:
    name: GFL
    backbone:
      name: ShuffleNetV2
      model_size: 1.0x
      out_stages: [2,3,4]
      activation: LeakyReLU
    fpn:
      name: PAN
      in_channels: [116, 232, 464]
      out_channels: 96
      start_level: 0
      num_outs: 3
    head:
      name: NanoDetHead
      num_classes: 80
      input_channel: 96
      feat_channels: 96
      stacked_convs: 2
      share_cls_reg: True
      octave_base_scale: 5
      scales_per_octave: 1
      strides: [8, 16, 32]
      reg_max: 7
      norm_cfg:
        type: BN
      loss:
        loss_qfl:
          name: QualityFocalLoss
          use_sigmoid: True
          beta: 2.0
          loss_weight: 1.0
        loss_dfl:
          name: DistributionFocalLoss
          loss_weight: 0.25
        loss_bbox:
          name: GIoULoss
          loss_weight: 2.0
data:
  train:
    name: coco
    img_path: ./dataset/train/img
    ann_path: ./dataset/coco_annotations/train_annotations.json
    input_size: [320,320] #[w,h]
    keep_ratio: True
    pipeline:
      perspective: 0.0
      scale: [0.6, 1.4]
      stretch: [[1, 1], [1, 1]]
      rotation: 0
      shear: 0
      translate: 0.2
      flip: 0.5
      brightness: 0.2
      contrast: [0.8, 1.2]
      saturation: [0.8, 1.2]
      normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
  val:
    name: coco
    img_path: ./dataset/val/img
    ann_path: ./dataset/coco_annotations/val_annotations.json
    input_size: [320,320] #[w,h]
    keep_ratio: True
    pipeline:
      normalize: [[103.53, 116.28, 123.675], [57.375, 57.12, 58.395]]
device:
  gpu_ids: [0]
  workers_per_gpu: 12
  batchsize_per_gpu: 1
schedule:
  resume:
  load_model: ./trained_models/model_last_100.pth
  optimizer:
    name: SGD
    lr: 0.14
    momentum: 0.9
    weight_decay: 0.0001
  warmup:
    name: linear
    steps: 300
    ratio: 0.1
  total_epochs: 160
  lr_schedule:
    name: MultiStepLR
    milestones: [130,160,150,155]
    gamma: 0.1
  val_intervals: 10
evaluator:
  name: CocoDetectionEvaluator
  save_key: mAP

log:
  interval: 10

#class_names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
#              'train', 'truck', 'boat', 'traffic_light', 'fire_hydrant',
#              'stop_sign', 'parking_meter', 'bench', 'bird', 'cat', 'dog',
#              'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe',
#              'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
#              'skis', 'snowboard', 'sports_ball', 'kite', 'baseball_bat',
#              'baseball_glove', 'skateboard', 'surfboard', 'tennis_racket',
#              'bottle', 'wine_glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
#              'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
#              'hot_dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
#              'potted_plant', 'bed', 'dining_table', 'toilet', 'tv', 'laptop',
#              'mouse', 'remote', 'keyboard', 'cell_phone', 'microwave',
#              'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock',
#              'vase', 'scissors', 'teddy_bear', 'hair_drier', 'toothbrush']

class_names: ['face', 'mask', 'face_mask']

ztt0810 avatar Jan 25 '21 07:01 ztt0810

Check that your val path is correct and that you can read the data

TD-wzw avatar Jan 25 '21 08:01 TD-wzw

I made a little mistake with Val before

TD-wzw avatar Jan 25 '21 08:01 TD-wzw

I am sure that every picture in the val has been loaded when I am doing validating, but a new problem has appeared at this time image And when I run test.py with the cmd python tools/test.py --config config/nanodet-m.yml --task test , the result.json file is still empty

ztt0810 avatar Jan 26 '21 15:01 ztt0810

This problem has not yet been encountered

TD-wzw avatar Jan 27 '21 02:01 TD-wzw

Okay,thanks a lot

ztt0810 avatar Jan 27 '21 03:01 ztt0810

Are you training on a single GPU

TD-wzw avatar Jan 27 '21 06:01 TD-wzw

yes,i use colab

ztt0810 avatar Jan 27 '21 11:01 ztt0810

Everything is all right

TD-wzw avatar Jan 28 '21 02:01 TD-wzw

The problem may still be in the validation set

TD-wzw avatar Jan 28 '21 02:01 TD-wzw

Okay, thank you. I will check it again.

ztt0810 avatar Jan 28 '21 02:01 ztt0810

My pleasure

TD-wzw avatar Jan 28 '21 02:01 TD-wzw

The problem may still be in the validation set

I met the same problem and checked my val_data configuration, there seems nothing wrong. How did you fix it bro?

dada1437903138 avatar Mar 27 '21 15:03 dada1437903138

Okay, thank you. I will check it again.

Have you fixed this problem?

dada1437903138 avatar Mar 27 '21 15:03 dada1437903138

Maybe you can check the 'num_classes' in your config file, it should be the same as the number of categories in your dataset

ztt0810 avatar Mar 28 '21 13:03 ztt0810

  1. Check whether the number of num_classes and class_names is equal.
  2. Check if the validation set path is correct and if there are annotation errors
  3. If the training data is small, several more epochs need to be trained to see the detection results; otherwise, the results are empty

carry-xz avatar Jun 03 '21 06:06 carry-xz

If you get this or zero loss while training, double check the consistency of *_annotations.json. For example, If you download and export open-images-v6 dataset with fiftyone make sure to:

  • Add iscrowd attribute to each detection
  • Delete unused classes in *_annotations.json > categories field.

ankandrew avatar Jul 07 '21 18:07 ankandrew

I may have fixed it. in my config file, CocoDetectionEvaluator is used:

evaluator:
  name: CocoDetectionEvaluator
  save_key: mAP

if your dataset is not good enough or your class_num is too big, then maybe your model won't converge even after 10 epoch.

here is my struggle to solve the problem:

case 1: when I train the model with my custom dataset: a train set of size 32 and a val set of size 32, the same exception is throwed out: (I'm using small dataset to test if the env is properly configured)

if 'caption' in anns[0]:
IndexError: list index out of range

case 2: however, after I change the dataset to: a train set of size 20000 and a val set of size 1000, the exception just gone away.

both case 1 and case 2 are with batch_size=32 and trained for 1 epoch and then validated for 1 epoch. and the case 2 just work fine.

so the conclusion is: don't use the build-in validation process untill you are sure that your model actually learns something.(maybe after 20 or more epochs (I just don't have the patience))

if you just want to avoid the exception, just delete this in you config file:

evaluator:
  name: CocoDetectionEvaluator
  save_key: mAP

duwangthefirst avatar Jul 29 '21 10:07 duwangthefirst

Hey. I am also facing similar issues while evaluating the trained model. It is throwing the error

TypeError: Object of type Tensor is not JSON serializable Capture

Can anyone help me to solve this error? Thanks in Advance.

Ratansairohith avatar Oct 12 '21 08:10 Ratansairohith