Open3DSOT icon indicating copy to clipboard operation
Open3DSOT copied to clipboard

Reproduction of M2track

Open st724586 opened this issue 2 years ago • 10 comments

Dear authors,

I am not able to reproduce the results reported in the paper using the given config at M2_track_kitti.yaml

The paper reports 61.5/88.2 for the pedestrian class on KITTI but I have 52.1/76.5.

I wonder if the same config used for generating the results in the paper is provided. Or is there anything else we can take note of to reproduce the results?

Thanks!

st724586 avatar Jul 07 '22 03:07 st724586

The config file for training KITTI Pedestrian is just the same as that we used for Car. However, since the Pedestrian has much less training data than Car in KITTI, we recommend you use a relatively smaller batch size (eg. 64 in total) to train the pedestrian tracker.

Ghostish avatar Jul 07 '22 04:07 Ghostish

The config file for training KITTI Pedestrian is just the same as that we used for Car. However, since the Pedestrian has much less training data than Car in KITTI, we recommend you use a relatively smaller batch size (eg. 64 in total) to train the pedestrian tracker.

Thanks for your reply. I have tried training with batch size 64 for 180 epochs for the pedestrian class. It seems the performance (55.0/76.6) is still a bit off compared to the reported. I wonder if you can kindly provide the training configs for different classes if the settings are not exactly the same as the default config.

st724586 avatar Jul 08 '22 07:07 st724586

As I said, the config is the same except for the batch size. Could you please provide more details about your platform (e.g. Cuda version, PyTorch version, GPU specs)?

Can you reproduce the results using the provided checkpoints?

By the way, the batch size you set in the command line is used for each GPU. For example, if you are using two GPUs and set the batch size to 64 in the command line, the total batch size becomes 128. Please make sure your setting is correct.

Ghostish avatar Jul 08 '22 08:07 Ghostish

Thanks for your reply. My environment settings are: pytorch : 1.8.0 CUDA Version: 11.4 GPU : V100 32GB And I trained all models with only one gpu.

I tested with your provided checkpoints and got 60.7 / 89.4 for Ped, 67.4 / 81.0 for Car. So I assume the data and environment is fine.

Also I trained all categoires with different batch size and epoches and got the results:

cyc van ped car
m2track (from paper) 73.2 / 93.5 53.8 / 70.7 61.5 / 88.2 65.5 / 80.8
m2track (bs64 epoch60) 76.79 / 94.59 50.18 / 63.70 52.1/ 7 6.5. 66.1 / 78.9
m2track (bs256 epoch180) 67.87 / 89.54 46.22 / 50.84 / 78.34 65.48 / 78.57
m2track (bs64 epoch180) 67.11 / 93.46 50.0 / 64.09 55.04 / 79.62 63.55 / 77.85

The performance of cyc and car is quite close to your results but van and ped result is still a bit of to yours.

The hparams.yaml file saved for my Pedestrian experiment is as below, I wonder if you guys can help to see whether is correct and I'm very grateful if you can provide any help.

config: !!python/object/new:easydict.EasyDict
  dictitems:
    IoU_space: 3
    angle_weight: 10.0
    batch_size: 64
    bb_offset: 2
    bb_scale: 1.25
    bc_weight: 1
    box_aware: true
    category_name: Pedestrian
    center_weight: 2
    cfg: cfgs/M2_track_kitti_ped_bs256.yaml
    check_val_every_n_epoch: 1
    checkpoint: null
    coordinate_mode: velodyne
    data_limit_box: true
    dataset: kitti
    degrees: false
    epoch: 180
    from_epoch: 0
    gradient_clip_val: 0.0
    limit_box: false
    log_dir: work_dir/M2_track_kitti_ped_bs64
    lr: 0.001
    lr_decay_rate: 0.1
    lr_decay_step: 20
    motion_cls_seg_weight: 0.1
    motion_threshold: 0.15
    net_model: m2track
    num_candidates: 4
    optimizer: Adam
    path: ./data/kitti/training
    point_sample_size: 1024
    preload_offset: 10
    preloading: true
    save_top_k: -1
    seg_weight: 0.1
    test: false
    test_split: test
    train_split: train
    train_type: train_motion
    up_axis: &id001
    - 0
    - 0
    - 1
    use_augmentation: true
    use_z: true
    val_split: test
    wd: 0
    workers: 10
  state:
    IoU_space: 3
    angle_weight: 10.0
    batch_size: 64
    bb_offset: 2
    bb_scale: 1.25
    bc_weight: 1
    box_aware: true
    category_name: Pedestrian
    center_weight: 2
    cfg: cfgs/M2_track_kitti_ped_bs256.yaml
    check_val_every_n_epoch: 1
    checkpoint: null
    coordinate_mode: velodyne
    data_limit_box: true
    dataset: kitti
    degrees: false
    epoch: 180
    from_epoch: 0
    gradient_clip_val: 0.0
    limit_box: false
    log_dir: work_dir/M2_track_kitti_ped_bs64
    lr: 0.001
    lr_decay_rate: 0.1
    lr_decay_step: 20
    motion_cls_seg_weight: 0.1
    motion_threshold: 0.15
    net_model: m2track
    num_candidates: 4
    optimizer: Adam
    path: ./data/kitti/training
    point_sample_size: 1024
    preload_offset: 10
    preloading: true
    save_top_k: -1
    seg_weight: 0.1
    test: false
    test_split: test
    train_split: train
    train_type: train_motion
    up_axis: *id001
    use_augmentation: true
    use_z: true
    val_split: test
    wd: 0
    workers: 10

st724586 avatar Jul 09 '22 02:07 st724586

Hi there, I've tested the code twice with the same config file on a similar platform as yours (a single v100 with pytorch 1.8.0 and CUDA 11.1). In my experiments, I did not have your issue. And the two retrained models reached the following results within 30 epochs. image image

I suppose this may relate to your training data. Could you add the --preloading flag to your training command to generate the preloaded data file? The generated .dat file should have a size of about 1.3GB on a Linux machine.

By the way, compared to Car, the numbers of training frames for Van and Cyclist are much smaller (1994, and 1529 respectively). So the training is a bit unstable due to the lack of training data. You may need to conduct more experiments to achieve the reported results.

Let me know if you still have a problem.

Ghostish avatar Jul 09 '22 15:07 Ghostish

Thanks a lot for the reply. Did your pass command line parameters like --batch size 64 in your training, because it seems that the batch size in your config file will be overwritten by those argument and the default value for batch size is 100 as in main.py. I wonder maybe that is the difference?

I checked the saved .dat which is 1.3GB as yours.

Also I just randomly choose some checkpoints (~10) for testing instead of all of them. Maybe I just got bad luck.

st724586 avatar Jul 09 '22 16:07 st724586

Yes, I passed it in the command line. Actually the batch size in the config file is useless and confusing. I will remove it soon.

Ghostish avatar Jul 09 '22 16:07 Ghostish

Thanks a lot for the reply. Did your pass command line parameters like --batch size 64 in your training, because it seems that the batch size in your config file will be overwritten by those argument and the default value for batch size is 100 as in main.py. I wonder maybe that is the difference?

I checked the saved .dat which is 1.3GB as yours.

Also I just randomly choose some checkpoints (~10) for testing instead of all of them. Maybe I just got bad luck.

In fact, the training frames for Pedestrians are also not sufficient for stable training (only 4600 frames).

And I think it is not a good idea to randomly select checkpoints to do the validation.

According to my experience, the precision reaches 80+ usually after the first learning rate decay (20 epochs). Maybe you can use this information to save some time if you do not want to do validation every epoch.

And I wonder about the way you test the models. Do you use the Tensorboard to monitor the metrics or just manually do testing after training? According to the config file you provided, the check_val_every_n_epoch was set to 1 and val_split was set to "test", which means you did validation on the test set after every epoch. You don't need to do it manually after training. Just use the Tensorboard to check all the results.

Let me know if you have new results.

Ghostish avatar Jul 10 '22 00:07 Ghostish

Below is the result of my bs64 epoch180 Pedestrian experiment

image

st724586 avatar Jul 10 '22 02:07 st724586

Below is the result of my bs64 epoch180 Pedestrian experiment

image

Seems that there are better checkpoints in this training and they are close to our reported result. I think your issue is already solved.

Feel free to contact me if you have further questions.

Ghostish avatar Jul 10 '22 02:07 Ghostish