SMOKE icon indicating copy to clipboard operation
SMOKE copied to clipboard

Training problem

Open skyhehe123 opened this issue 4 years ago • 15 comments

Hi, Thanks for sharing this great work. I train on train(3712) and evaluate on the val (3769) can only gain very low AP. Could you share your training log file so that I can trace the bugs.

skyhehe123 avatar May 15 '20 04:05 skyhehe123

Hi,

How much AP did you achieve? As indicated in Sec. 5.1 of our paper, the performance degrades on train/val set because of the lack of examples. This is reasonable since detecting each object as a single point is a difficult task.

Or maybe there is an unobserved bug in the code....

lzccccc avatar May 15 '20 07:05 lzccccc

Hi @lzccccc ,

Thanks for your wonderful code. I also try to train on training dataset (3712) and evaluate on the validation dataset (3769), but only can get the following ap (3d/bev):

Easy Moderate Hard
Car 10.7 / 15.9 7.7 / 12.2 7.7 / 10.2

which is much lower than the paper reported.

I use the original setting that trains on 4 gpus, should I train longer or use a smaller learning rate?

Best regards, Qing LIAN

lianqing01 avatar Jun 14 '20 02:06 lianqing01

I can get similar results from the released code in the validation dataset.

Thanks for kindly sharing the code.

lianqing11 avatar Jun 17 '20 01:06 lianqing11

I can get similar results from the released code in the validation dataset.

Thanks for kindly sharing the code.

Hi @lzccccc ,

Thanks for your wonderful code. I also try to train on training dataset (3712) and evaluate on the validation dataset (3769), but only can get the following ap (3d/bev):

Easy Moderate Hard Car 10.7 / 15.9 7.7 / 12.2 7.7 / 10.2 which is much lower than the paper reported.

I use the original setting that trains on 4 gpus, should I train longer or use a smaller learning rate?

Best regards, Qing LIAN

Hi, could you share how you solve it? Thks

ZhxJia avatar Aug 10 '20 05:08 ZhxJia

I get exactly the same problem, I follow the paper, training on the train(3712) for 60 epochs and drops learning rate at 25 & 40 epoch by a factor of 10, but i still get low AP on the val(3769)

car_detection_3d AP: 11.122600 8.383281 6.976796 pedestrian_detection_3d AP: 2.784572 2.732527 2.739558 cyclist_detection_3d AP: 0.606061 0.568182 0.568182

Could anyone share the solution? Thks

ZiYang-xie avatar Jan 29 '21 06:01 ZiYang-xie

Hi, thks for the high-quality code, I have solved the problem, just using the 14500 iter in the default.py, about 60 epochs on trainval(7481) set but 120 epochs on train(3712) set. And I get the AP car_detection_3d AP: 16.485666 14.154558 11.966417

ZiYang-xie avatar Jan 30 '21 02:01 ZiYang-xie

Hi, thks for the high-quality code, I have solved the problem, just using the 14500 iter in the default.py, about 60 epochs on trainval(7481) set but 120 epochs on train(3712) set. And I get the AP car_detection_3d AP: 16.485666 14.154558 11.966417

Do you mean that after you set iter 7250 to keep 60 epochs on train set, the result is normal? I didn't change any code(that means I use 25000 iter) and the results even lower than you.

  Easy Moderate Hard
Car 6.74 / 12.17 4.35 / 8.09 4.02 / 7.65
Pedestrian 2.12 / 2.63 1.78 / 1.93 1.39 / 1.52
Cyclist 1.04 / 1.34 0.41 / 0.50 0.41 / 0.51

Is that means my training is overfitting?

mrsempress avatar Mar 11 '21 05:03 mrsempress

I trained the network using the following setup

MODEL:
  WEIGHT: "catalog://ImageNetPretrained/DLA34"
INPUT:
  FLIP_PROB_TRAIN: 0.5
  SHIFT_SCALE_PROB_TRAIN: 0.3
OUTPUT_DIR: "./tools/logs_trainOnTrain_120ep"
DATASETS:
  DETECT_CLASSES: ("Car", "Cyclist", "Pedestrian")
  TRAIN: ("kitti_train",)
  TEST: ("kitti_train",)
  TRAIN_SPLIT: "train"
  TEST_SPLIT: "val"
SOLVER:
  BASE_LR: 2.5e-4
  STEPS: (5800, 9280)
  MAX_ITERATION: 13920
  IMS_PER_BATCH: 32

and got these results:

car_R11
2d, 88.27, 78.84, 70.18
bev, 24.41, 19.57, 16.81
3d, 18.25, 15.26, 14.21
aos, 88.04, 78.41, 69.58

car_R40
2d, 91.47, 83.36, 74.03
bev, 19.91, 13.72, 11.61
3d, 12.68, 8.85, 7.12
aos, 91.19, 82.84, 73.27

pedestrian_R11
2d, 63.12, 48.89, 41.05
bev, 11.49, 11.27, 11.15
3d, 11.03, 10.89, 10.35
aos, 49.35, 38.70, 32.88

pedestrian_R40
2d, 60.76, 50.06, 41.44
bev, 4.49, 3.91, 3.22
3d, 3.44, 2.92, 2.29
aos, 46.02, 37.66, 31.16

cyclist_R11
2d, 48.29, 34.82, 33.71
bev, 4.23, 3.80, 3.03
3d, 3.25, 2.27, 2.27
aos, 31.60, 22.19, 21.75

cyclist_R40
2d, 48.38, 31.85, 29.94
bev, 2.13, 1.18, 0.87
3d, 1.74, 0.74, 0.72
aos, 32.51, 21.08, 19.96

vobecant avatar Mar 29 '21 08:03 vobecant

@vobecant Could you share what IoU you have used to get the results.

nikhil-nakhate avatar Mar 29 '21 22:03 nikhil-nakhate

I used your config and wasn't able to replicate the results. Did you make any additional changes? @vobecant

nikhil-nakhate avatar Mar 30 '21 00:03 nikhil-nakhate

@nikhil-nakhate I didn't make any additional changes. I run this exact command:

now=$(date +"%Y%m%d_%H%M%S")
EXPNAME=SMOKE_trainOnTrain_120ep_${now}
JOB_FILE=./jobs/${EXPNAME}.job
g=4
NUMCPUS=16
CONFIG_FILE=/path/to/SMOKE/configs/smoke_trainOnTrain_testOnVal_120ep.yaml
python tools/plain_train_net.py --num-gpus 4 --config-file ${CONFIG_FILE}

To get the numbers that I reported, I used this command

python tools/plain_train_net.py --eval-only --config-file "${CONFIG_FILE}"

followed by running my new script:

import numpy as np
import os


def get_ap(prec, ap_type):
    prec = np.asarray(prec)
    sums = 0
    if ap_type == 11:
        for i in range(0, prec.shape[-1], 4):
            sums = sums + prec[..., i]
        ap = sums / 11 * 100
    else:
        for i in range(1, prec.shape[-1]):
            sums = sums + prec[..., i]
        ap = sums / 40 * 100
    return ap


def get_aps(results_dir):
    def print_file(s, f):
        f.write('{}\n'.format(s))
        print(s)

    labels = ['car', 'pedestrian', 'cyclist']
    eval_types = ['detection', 'detection_ground', 'detection_3d', 'orientation']
    eval_types_short = {'detection': '2d', 'detection_3d': '3d', 'orientation': 'aos', 'detection_ground': 'bev'}
    difficulties = ['easy', 'moderate', 'hard']

    res_path = os.path.join(results_dir, 'parsed_res.txt')
    f = open(res_path, 'w')
    for label in labels:
        for ap_type in [11, 40]:
            print_file('\n{}_R{}'.format(label, ap_type), f)
            for eval_type in eval_types:
                res_file = os.path.join(results_dir, 'stats_{}_{}.txt'.format(label, eval_type))
                with open(res_file, 'r') as fl:
                    lines = fl.readlines()
                diff_res = [eval_types_short[eval_type]]
                for i, difficulty in enumerate(difficulties):
                    prec = [float(tmp) for tmp in lines[i].strip().split(' ')]
                    ap_res = get_ap(prec, ap_type)
                    diff_res.append('{:.2f}'.format(ap_res))
                print_file(', '.join(diff_res), f)
    f.close()
    print('Saved parsed results to {}'.format(res_path))


if __name__ == '__main__':
    results_dir = '/path/to/results/logs_trainOnTrain_120ep/inference/kitti_train'
    get_aps(results_dir)

Feel free to check the code!

vobecant avatar Mar 30 '21 06:03 vobecant

Hey @vobecant, Thanks so much for this. It really helps. Let me get back to you with the training results.

nikhil-nakhate avatar Mar 31 '21 00:03 nikhil-nakhate

@vobecant The following are the results that I got with your configs:

car_R11 2d, 84.38, 76.68, 68.68 bev, 21.89, 18.15, 15.98 3d, 16.25, 13.93, 13.66 aos, 84.17, 76.05, 67.74

car_R40 2d, 84.13, 77.06, 70.24 bev, 15.62, 10.93, 9.48 3d, 9.06, 6.34, 5.79 aos, 83.92, 76.36, 69.15

pedestrian_R11 2d, 54.75, 47.58, 40.32 bev, 6.33, 6.18, 5.85 3d, 5.87, 5.62, 5.64 aos, 37.60, 33.65, 29.05

pedestrian_R40 2d, 54.87, 46.86, 40.52 bev, 3.05, 2.71, 2.14 3d, 2.09, 1.88, 1.69 aos, 34.69, 30.42, 26.16

cyclist_R11 2d, 49.69, 35.25, 34.59 bev, 2.06, 1.14, 1.14 3d, 2.05, 1.14, 1.14 aos, 36.92, 23.43, 22.65

cyclist_R40 2d, 48.46, 32.58, 30.64 bev, 0.99, 0.61, 0.52 3d, 0.83, 0.47, 0.35 aos, 33.77, 22.03, 20.56

nikhil-nakhate avatar Mar 31 '21 05:03 nikhil-nakhate

@vobecant hello, I run the code with your configurations, and I got these results:

car_R11
2d, 19.04, 13.31, 12.71
bev, 9.09, 4.55, 4.55
3d, 9.09, 4.55, 4.55
aos, 18.28, 12.45, 11.73

car_R40
2d, 12.98, 12.04, 10.87
bev, 0.55, 0.47, 0.35
3d, 0.10, 0.05, 0.05
aos, 12.17, 11.16, 9.93

pedestrian_R11
2d, 14.54, 9.74, 9.79
bev, 0.91, 0.71, 0.75
3d, 0.64, 0.56, 0.56
aos, 6.74, 4.66, 4.69

pedestrian_R40
2d, 10.67, 8.03, 6.73
bev, 0.46, 0.20, 0.20
3d, 0.34, 0.15, 0.15
aos, 4.94, 3.76, 3.17

cyclist_R11
2d, 1.73, 1.57, 1.57
bev, 0.00, 0.00, 0.00
3d, 0.00, 0.00, 0.00
aos, 0.01, 0.46, 0.46

cyclist_R40
2d, 0.48, 0.43, 0.43
bev, 0.00, 0.00, 0.00
3d, 0.00, 0.00, 0.00

Do you have any idea why they are so low? Thank you.

arwagh avatar Dec 05 '23 08:12 arwagh