second.pytorch icon indicating copy to clipboard operation
second.pytorch copied to clipboard

the tough process to train the second.pytorch on Nuscenes

Open ConanCui opened this issue 3 years ago • 7 comments

numba should be version 0.40.0 conda install numba==0.40.0

and set the variable in ~/.bashrc export NUMBAPRO_CUDA_DRIVER=/usr/lib/x86_64-linux-gnu/libcuda.so export NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so export NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice

ConanCui avatar Aug 10 '20 13:08 ConanCui

nuscenes should be 1.0.1 according to #290 pip install nuscenes-devkit==1.0.1

ConanCui avatar Aug 10 '20 13:08 ConanCui

kitti viwer backend issue solve method #283

I change the code in the main function of /backend/main.py "app.run(host='127.0.0.1', threaded=True, port=port)" to "app.run(host='0.0.0.0', threaded=True, port=port)", and I type "server ip: port" not "127.0.0.1:port" into the "backend" box.

and use the default port run python ./kittiviewer/backend/main.py main

setting datasetClassName NuScenesDataset backend http://"you own ip":16666/ rootPath /home/kosuke/dataset/nuScenes infoPath /home/kosuke/dataset/nuScenes/infos_val.pkl

ConanCui avatar Aug 12 '20 07:08 ConanCui

pytorch version

pip install torch==1.4.0+cu100 torchvision==0.5.0+cu100 -f https://download.pytorch.org/whl/torch_stable.html

ConanCui avatar Aug 13 '20 03:08 ConanCui

use the all.pp.mida.config

Generate output labels... generate label finished(1.93/s). start eval: Evaluation nusc Nusc v1.0-trainval Evaluation bus Nusc dist [email protected], 1.0, 2.0, 4.0 and TP errors 23.03, 41.42, 53.03, 55.15

car Nusc dist [email protected], 1.0, 2.0, 4.0 and TP errors 59.25, 71.96, 75.93, 78.26

construction_vehicle Nusc dist [email protected], 1.0, 2.0, 4.0 and TP errors 0.00, 0.00, 0.05, 0.20

trailer Nusc dist [email protected], 1.0, 2.0, 4.0 and TP errors 3.47, 15.92, 24.34, 33.10

truck Nusc dist [email protected], 1.0, 2.0, 4.0 and TP errors 10.11, 22.88, 29.00, 31.40

barrier Nusc dist [email protected], 1.0, 2.0, 4.0 and TP errors 8.10, 21.51, 29.11, 35.29

bicycle Nusc dist [email protected], 1.0, 2.0, 4.0 and TP errors 0.00, 0.00, 0.00, 0.00

motorcycle Nusc dist [email protected], 1.0, 2.0, 4.0 and TP errors 3.08, 5.04, 5.28, 5.38

pedestrian Nusc dist [email protected], 1.0, 2.0, 4.0 and TP errors 53.05, 54.88, 56.49, 58.82

traffic_cone Nusc dist [email protected], 1.0, 2.0, 4.0 and TP errors 11.37, 12.28, 13.87, 17.66

ConanCui avatar Aug 13 '20 05:08 ConanCui

Your hints above helped me already a lot. I have currently CUDA 10.0 running with cudnn 7.4.2 and got the installation of the spconv also working under CUDA 10.0 for your mentioned torch version 1.4. Currently I am trying to create the dataset:

cd second
python create_data.py nuscenes_data_prep --data_path=$EXTERNAL_DATADIR/ext/nuscenes --version="v1.0-trainval" --max_sweeps=10 --dataset_name="NuscenesDatasetVelo"

I just had a problem with this: numba 0.40.0 does not like newer llvmlite versions. For me pip install -U llvmlite==0.27 worked out. Then the command above was not liked anymore by the fire package (not sure if this has sth to do with the llvmlite downgrade or if it was like this before) but removing all options=... parts helps:

python create_data.py nuscenes_data_prep $EXTERNAL_DATADIR/ext/nuscenes "v1.0-trainval" "NuscenesDatasetVelo" 10

demmerichs avatar Sep 16 '20 01:09 demmerichs

I also used the all.pp.mida.config config, but I could not reproduce your results @ConanCui or the ones mentioned in the README (see mine below). Could you help me figure out what I could have done wrong? Or perhaps you changed sth. in the config yourself? A couple of things that I could think off of top of my head:

  • Did you pretrain your net/used pretrained networks?
  • Did you do multiple runs and selected the best and showed it?
  • Did you do multiple-GPU training (in principle, this should only shorten runtime, however the README mentions, that using multiple GPUs, the number of iterations basically scales accordingly and therefore could have a major impact)?
  • Did you train with or without velocity annotation?
  • Did you do anything else that might be important, but was not explained or shown in the README of this repo?

Thanks for your help, I really appreciate it. Below are my (much worser) results:

(secpy):~/code/sync/second.pytorch/second$ python ./pytorch/train.py evaluate --config_path=./configs/nuscenes/all.pp.mida.config --model_dir=$OUTPUT_DATADIR/run25 --measure_time=True --batch_size=1
Restoring parameters from .../run25/voxelnet-58650.tckpt
feature_map_size [1, 100, 100]
Generate output labels...
[100.0%][===================>][10.82it/s][10:55>00:00]
generate label finished(9.10/s). start eval:
avg example to torch time: 25.850 ms
avg prep time: 45.514 ms
avg voxel_feature_extractor time = 17.381 ms
avg middle forward time = 1.823 ms
avg rpn forward time = 10.244 ms
avg predict time = 7.787 ms
100%|##############| 6019/6019 [00:06<00:00, 899.14it/s]
Evaluation nusc
Nusc v1.0-trainval Evaluation
barrier Nusc dist [email protected], 1.0, 2.0, 4.0 and TP errors
1.21, 8.02, 11.93, 14.71
trans_err, scale_err, orient_err, vel_err, attr_err: 0.6590, 0.3456, 0.0647, nan, nan
bicycle Nusc dist [email protected], 1.0, 2.0, 4.0 and TP errors
0.00, 0.00, 0.00, 0.00
trans_err, scale_err, orient_err, vel_err, attr_err: 1.0000, 1.0000, 1.0000, 1.0000, 1.0000
bus Nusc dist [email protected], 1.0, 2.0, 4.0 and TP errors
4.27, 16.65, 27.85, 29.97
trans_err, scale_err, orient_err, vel_err, attr_err: 0.5938, 0.2057, 0.9799, 0.5178, 0.9083
car Nusc dist [email protected], 1.0, 2.0, 4.0 and TP errors
45.62, 63.50, 68.84, 71.22
trans_err, scale_err, orient_err, vel_err, attr_err: 0.3015, 0.1673, 1.0863, 0.2764, 0.5168
construction_vehicle Nusc dist [email protected], 1.0, 2.0, 4.0 and TP errors
0.00, 0.00, 0.00, 0.00
trans_err, scale_err, orient_err, vel_err, attr_err: 1.0000, 1.0000, 1.0000, 1.0000, 1.0000
motorcycle Nusc dist [email protected], 1.0, 2.0, 4.0 and TP errors
2.34, 4.07, 4.36, 4.62
trans_err, scale_err, orient_err, vel_err, attr_err: 0.3189, 0.2389, 1.1627, 0.5674, 0.7895
pedestrian Nusc dist [email protected], 1.0, 2.0, 4.0 and TP errors
40.24, 43.26, 45.30, 48.25
trans_err, scale_err, orient_err, vel_err, attr_err: 0.2348, 0.2759, 0.7342, 0.3170, 0.0745
traffic_cone Nusc dist [email protected], 1.0, 2.0, 4.0 and TP errors
2.55, 3.14, 4.40, 7.25
trans_err, scale_err, orient_err, vel_err, attr_err: 0.3763, 0.4232, nan, nan, nan
trailer Nusc dist [email protected], 1.0, 2.0, 4.0 and TP errors
0.00, 0.41, 3.24, 6.72
trans_err, scale_err, orient_err, vel_err, attr_err: 0.8463, 0.2332, 1.1729, 0.2161, 0.2136
truck Nusc dist [email protected], 1.0, 2.0, 4.0 and TP errors
1.36, 5.70, 8.84, 10.78
trans_err, scale_err, orient_err, vel_err, attr_err: 0.5058, 0.2423, 0.8884, 0.1464, 0.3445

demmerichs avatar Sep 20 '20 21:09 demmerichs

So I did another run. For future reference:

  • I turned training of velocities back off (that was the default), so the errors for velocities are back to 1 (they are clipped to one for the NDS score I assume, otherwise they are probably larger)
  • I increased the number of training iterations by a factor of 2 from the default config value of 58650 up to 117300. Additionally I doubled steps_per_eval and the summary config values, while also halving the lr_max max learning rate (from 3e-3 to 1.5e-3)
  • the parameters groundtruth_localization_noise_std and groundtruth_rotation_uniform_noise were default set to zero with non-zero config values commented out in the nuscenes/all.pp.mida.config. I used the non-zero values in this run.

Results are much better now:

Nusc v1.0-trainval Evaluation
barrier Nusc dist [email protected], 1.0, 2.0, 4.0 and TP errors
8.77, 19.91, 25.31, 29.29
trans_err, scale_err, orient_err, vel_err, attr_err: 0.5038, 0.3089, 0.0795, nan, nan
bicycle Nusc dist [email protected], 1.0, 2.0, 4.0 and TP errors
0.00, 0.00, 0.00, 0.00
trans_err, scale_err, orient_err, vel_err, attr_err: 1.0000, 1.0000, 1.0000, 1.0000, 1.0000
bus Nusc dist [email protected], 1.0, 2.0, 4.0 and TP errors
18.71, 40.33, 53.45, 56.18
trans_err, scale_err, orient_err, vel_err, attr_err: 0.4433, 0.1816, 0.5368, 1.0000, 0.8788
car Nusc dist [email protected], 1.0, 2.0, 4.0 and TP errors
61.21, 75.73, 80.39, 82.10
trans_err, scale_err, orient_err, vel_err, attr_err: 0.2331, 0.1565, 0.3301, 1.0000, 0.4892
construction_vehicle Nusc dist [email protected], 1.0, 2.0, 4.0 and TP errors
0.00, 0.06, 2.05, 3.96
trans_err, scale_err, orient_err, vel_err, attr_err: 0.9541, 0.5010, 1.5074, 1.0000, 0.3943
motorcycle Nusc dist [email protected], 1.0, 2.0, 4.0 and TP errors
13.03, 17.50, 17.76, 18.11
trans_err, scale_err, orient_err, vel_err, attr_err: 0.2709, 0.2395, 0.9403, 1.0000, 0.7852
pedestrian Nusc dist [email protected], 1.0, 2.0, 4.0 and TP errors
49.92, 53.01, 55.43, 58.28
trans_err, scale_err, orient_err, vel_err, attr_err: 0.2069, 0.2756, 1.1877, 1.0000, 0.1486
traffic_cone Nusc dist [email protected], 1.0, 2.0, 4.0 and TP errors
12.95, 14.82, 17.72, 23.12
trans_err, scale_err, orient_err, vel_err, attr_err: 0.3155, 0.3728, nan, nan, nan
trailer Nusc dist [email protected], 1.0, 2.0, 4.0 and TP errors
1.65, 14.02, 24.98, 31.80
trans_err, scale_err, orient_err, vel_err, attr_err: 0.6503, 0.2065, 0.8314, 1.0000, 0.2412
truck Nusc dist [email protected], 1.0, 2.0, 4.0 and TP errors
9.92, 22.59, 28.56, 31.10
trans_err, scale_err, orient_err, vel_err, attr_err: 0.4275, 0.2079, 0.3662, 1.0000, 0.3992

For better overview I added the average AP over all four distance thresholds and the NDS score:

  "mean_dist_aps": {                                                                                                                                                                                               
    "car": 0.7485639454052269,                                                                                                                                                                                     
    "truck": 0.23041285333587602,                                                                                                                                                                                  
    "bus": 0.4216736238804766,                                                                                                                                                                                     
    "trailer": 0.18110548314711833,                                                                                                                                                                                
    "construction_vehicle": 0.015189000916920764,                                                                                                                                                                  
    "pedestrian": 0.5416020005685676,                                                                                                                                                                              
    "motorcycle": 0.1660028163976607,                                                                                                                                                                              
    "bicycle": 0.0,                                                                                                                                                                                                
    "traffic_cone": 0.17154555139282626,                                                                                                                                                                           
    "barrier": 0.20821119312361125,
  }                                                                                                                           
  "mean_ap": 0.2684306468168284
  "nd_score": 0.3201250428101572

demmerichs avatar Sep 22 '20 12:09 demmerichs