mmskeleton
mmskeleton copied to clipboard
Preparing custom dataset from Videos
Hey, I want to prepare custom dataset from videos which have actions: CALL: answer phone call COUG: cough DRIN: drink water SCRA: scratch head SNEE: sneeze STRE: stretch arms WAVE: wave hand WIPE: wipe glasses
I am using this dataset: https://web.bii.a-star.edu.sg/~chengli/FluRecognition.html
Can explain to me the following terms from build_dataset_example.yaml?
How I should calculate image_size, pixel_std, image_mean, image_std this video dataset?
I have tried preparing the dataset using default parameter and started the training process but the training loss does not decrease and accuracy was 0.000.
INFO:mmcv.runner.runner:Epoch [11][100/840] lr: 0.10000, eta: 0:43:07, time: 0.060, data_time: 0.026, memory: 2344, loss: 2.4426 INFO:mmcv.runner.runner:Epoch [11][200/840] lr: 0.10000, eta: 0:43:02, time: 0.058, data_time: 0.024, memory: 2344, loss: 2.4421 INFO:mmcv.runner.runner:Epoch [11][300/840] lr: 0.10000, eta: 0:42:58, time: 0.058, data_time: 0.024, memory: 2344, loss: 2.4406 INFO:mmcv.runner.runner:Epoch [11][400/840] lr: 0.10000, eta: 0:42:53, time: 0.059, data_time: 0.025, memory: 2344, loss: 2.4408 INFO:mmcv.runner.runner:Epoch [11][500/840] lr: 0.10000, eta: 0:42:49, time: 0.059, data_time: 0.025, memory: 2344, loss: 2.4415 INFO:mmcv.runner.runner:Epoch [11][600/840] lr: 0.10000, eta: 0:42:44, time: 0.059, data_time: 0.025, memory: 2344, loss: 2.4418 INFO:mmcv.runner.runner:Epoch [11][700/840] lr: 0.10000, eta: 0:42:40, time: 0.058, data_time: 0.023, memory: 2344, loss: 2.4420 INFO:mmcv.runner.runner:Epoch [11][800/840] lr: 0.10000, eta: 0:42:35, time: 0.058, data_time: 0.024, memory: 2344, loss: 2.4426 INFO:mmcv.runner.runner:Epoch [12][100/840] lr: 0.10000, eta: 0:42:18, time: 0.061, data_time: 0.027, memory: 2344, loss: 2.4422 INFO:mmcv.runner.runner:Epoch [12][200/840] lr: 0.10000, eta: 0:42:14, time: 0.059, data_time: 0.026, memory: 2344, loss: 2.4419 INFO:mmcv.runner.runner:Epoch [12][300/840] lr: 0.10000, eta: 0:42:10, time: 0.059, data_time: 0.026, memory: 2344, loss: 2.4410 INFO:mmcv.runner.runner:Epoch [12][400/840] lr: 0.10000, eta: 0:42:05, time: 0.058, data_time: 0.025, memory: 2344, loss: 2.4407 INFO:mmcv.runner.runner:Epoch [12][500/840] lr: 0.10000, eta: 0:42:01, time: 0.059, data_time: 0.026, memory: 2344, loss: 2.4412 INFO:mmcv.runner.runner:Epoch [12][600/840] lr: 0.10000, eta: 0:41:57, time: 0.059, data_time: 0.025, memory: 2344, loss: 2.4424 INFO:mmcv.runner.runner:Epoch [12][700/840] lr: 0.10000, eta: 0:41:52, time: 0.059, data_time: 0.025, memory: 2344, loss: 2.4421 INFO:mmcv.runner.runner:Epoch [12][800/840] lr: 0.10000, eta: 0:41:47, time: 0.059, data_time: 0.024, memory: 2344, loss: 2.4425 INFO:mmcv.runner.runner:Epoch [13][100/840] lr: 0.10000, eta: 0:41:31, time: 0.060, data_time: 0.025, memory: 2344, loss: 2.4422 INFO:mmcv.runner.runner:Epoch [13][200/840] lr: 0.10000, eta: 0:41:27, time: 0.059, data_time: 0.025, memory: 2344, loss: 2.4421 INFO:mmcv.runner.runner:Epoch [13][300/840] lr: 0.10000, eta: 0:41:22, time: 0.058, data_time: 0.024, memory: 2344, loss: 2.4399 INFO:mmcv.runner.runner:Epoch [13][400/840] lr: 0.10000, eta: 0:41:17, time: 0.058, data_time: 0.025, memory: 2344, loss: 2.4418 INFO:mmcv.runner.runner:Epoch [13][500/840] lr: 0.10000, eta: 0:41:13, time: 0.058, data_time: 0.024, memory: 2344, loss: 2.4417 INFO:mmcv.runner.runner:Epoch [13][600/840] lr: 0.10000, eta: 0:41:08, time: 0.059, data_time: 0.025, memory: 2344, loss: 2.4422 INFO:mmcv.runner.runner:Epoch [13][700/840] lr: 0.10000, eta: 0:41:03, time: 0.058, data_time: 0.024, memory: 2344, loss: 2.4426 INFO:mmcv.runner.runner:Epoch [13][800/840] lr: 0.10000, eta: 0:40:59, time: 0.059, data_time: 0.024, memory: 2344, loss: 2.4421 INFO:mmcv.runner.runner:Epoch [14][100/840] lr: 0.10000, eta: 0:40:43, time: 0.060, data_time: 0.025, memory: 2344, loss: 2.4422 INFO:mmcv.runner.runner:Epoch [14][200/840] lr: 0.10000, eta: 0:40:39, time: 0.058, data_time: 0.025, memory: 2344, loss: 2.4421 INFO:mmcv.runner.runner:Epoch [14][300/840] lr: 0.10000, eta: 0:40:34, time: 0.058, data_time: 0.024, memory: 2344, loss: 2.4413 INFO:mmcv.runner.runner:Epoch [14][400/840] lr: 0.10000, eta: 0:40:29, time: 0.058, data_time: 0.024, memory: 2344, loss: 2.4411 INFO:mmcv.runner.runner:Epoch [14][500/840] lr: 0.10000, eta: 0:40:24, time: 0.058, data_time: 0.024, memory: 2344, loss: 2.4424 INFO:mmcv.runner.runner:Epoch [14][600/840] lr: 0.10000, eta: 0:40:19, time: 0.058, data_time: 0.025, memory: 2344, loss: 2.4417 INFO:mmcv.runner.runner:Epoch [14][700/840] lr: 0.10000, eta: 0:40:15, time: 0.059, data_time: 0.024, memory: 2344, loss: 2.4424 INFO:mmcv.runner.runner:Epoch [14][800/840] lr: 0.10000, eta: 0:40:10, time: 0.058, data_time: 0.024, memory: 2344, loss: 2.4424 INFO:mmcv.runner.runner:Epoch [15][100/840] lr: 0.10000, eta: 0:39:55, time: 0.060, data_time: 0.025, memory: 2344, loss: 2.4422 INFO:mmcv.runner.runner:Epoch [15][200/840] lr: 0.10000, eta: 0:39:50, time: 0.058, data_time: 0.024, memory: 2344, loss: 2.4419 INFO:mmcv.runner.runner:Epoch [15][300/840] lr: 0.10000, eta: 0:39:45, time: 0.058, data_time: 0.024, memory: 2344, loss: 2.4410 INFO:mmcv.runner.runner:Epoch [15][400/840] lr: 0.10000, eta: 0:39:40, time: 0.058, data_time: 0.024, memory: 2344, loss: 2.4410 INFO:mmcv.runner.runner:Epoch [15][500/840] lr: 0.10000, eta: 0:39:36, time: 0.059, data_time: 0.024, memory: 2344, loss: 2.4421 INFO:mmcv.runner.runner:Epoch [15][600/840] lr: 0.10000, eta: 0:39:31, time: 0.058, data_time: 0.024, memory: 2344, loss: 2.4411 INFO:mmcv.runner.runner:Epoch [15][700/840] lr: 0.10000, eta: 0:39:26, time: 0.058, data_time: 0.024, memory: 2344, loss: 2.4417 INFO:mmcv.runner.runner:Epoch [15][800/840] lr: 0.10000, eta: 0:39:21, time: 0.058, data_time: 0.024, memory: 2344, loss: 2.4419 INFO:mmcv.runner.runner:Epoch(train) [15][18] loss: 2.2971, top1: 0.0000, top5: 0.0000 INFO:mmcv.runner.runner:Epoch [16][100/840] lr: 0.10000, eta: 0:39:07, time: 0.060, data_time: 0.025, memory: 2344, loss: 2.4426 INFO:mmcv.runner.runner:Epoch [16][200/840] lr: 0.10000, eta: 0:39:02, time: 0.058, data_time: 0.025, memory: 2344, loss: 2.4419 INFO:mmcv.runner.runner:Epoch [16][300/840] lr: 0.10000, eta: 0:38:57, time: 0.058, data_time: 0.024, memory: 2344, loss: 2.4407 INFO:mmcv.runner.runner:Epoch [16][400/840] lr: 0.10000, eta: 0:38:52, time: 0.058, data_time: 0.025, memory: 2344, loss: 2.4410
I have used the following train.yaml:
argparse_cfg: gpus: bind_to: processor_cfg.gpus help: number of gpus work_dir: bind_to: processor_cfg.work_dir help: the dir to save logs and models batch_size: bind_to: processor_cfg.batch_size resume_from: bind_to: processor_cfg.resume_from help: the checkpoint file to resume from
processor_cfg: type: 'processor.recognition.train' workers: 2
model setting model_cfg: type: 'models.backbones.ST_GCN_18' in_channels: 3 num_class: 8 edge_importance_weighting: True graph_cfg: layout: 'coco' strategy: 'spatial' loss_cfg: type: 'torch.nn.CrossEntropyLoss'
dataset setting dataset_cfg: # training set - type: "datasets.DataPipeline" data_source: type: "datasets.SkeletonLoader" data_dir: ./data/symptoms_data/train num_track: 2 num_keypoints: 17 repeat: 20 pipeline: - {type: "datasets.skeleton.normalize_by_resolution"} - {type: "datasets.skeleton.mask_by_visibility"} - {type: "datasets.skeleton.pad_zero", size: 150 } - {type: "datasets.skeleton.random_crop", size: 150 } - {type: "datasets.skeleton.simulate_camera_moving"} - {type: "datasets.skeleton.transpose", order: [0, 2, 1, 3]} - {type: "datasets.skeleton.to_tuple"}
- type: "datasets.DataPipeline"
data_source:
type: "datasets.SkeletonLoader"
data_dir: ./data/symptoms_data/val
num_track: 2
num_keypoints: 17
pipeline:
- {type: "datasets.skeleton.normalize_by_resolution"}
- {type: "datasets.skeleton.mask_by_visibility"}
- {type: "datasets.skeleton.pad_zero", size: 300 }
- {type: "datasets.skeleton.random_crop", size: 300 }
- {type: "datasets.skeleton.transpose", order: [0, 2, 1, 3]}
- {type: "datasets.skeleton.to_tuple"}
dataloader setting batch_size: 32 gpus: 3
optimizer setting optimizer_cfg: type: 'torch.optim.SGD' lr: 0.1 momentum: 0.9 nesterov: true weight_decay: 0.0001
runtime setting workflow: [['train', 5], ['val', 1]] work_dir: ./work_dir/recognition/st_gcn/symptoms_data total_epochs: 65 training_hooks: lr_config: policy: 'step' step: [20, 30, 40, 50] log_config: interval: 100 hooks: - type: TextLoggerHook checkpoint_config: interval: 5 optimizer_config: resume_from: load_from:
and build_dataset_example.yaml:
processor_cfg: type: "processor.skeleton_dataset.build" gpus: 1 worker_per_gpu: 2 video_dir: data/symptoms_data/videos out_dir: "data/symptoms_data/dataset" category_annotation: resource/category_annotations_symptoms.json detection_cfg: model_cfg: configs/mmdet/cascade_rcnn_r50_fpn_1x.py checkpoint_file: mmskeleton://mmdet/cascade_rcnn_r50_fpn_20e bbox_thre: 0.8 estimation_cfg: model_cfg: configs/pose_estimation/hrnet/pose_hrnet_w32_256x192_test.yaml checkpoint_file: mmskeleton://pose_estimation/pose_hrnet_w32_256x192 data_cfg: image_size: - 192 - 256 pixel_std: 200 image_mean: - 0.485 - 0.456 - 0.406 image_std: - 0.229 - 0.224 - 0.225 post_process: true tracker_cfg: null
argparse_cfg: gpus: bind_to: processor_cfg.gpus help: number of gpus worker_per_gpu: bind_to: processor_cfg.worker_per_gpu help: number of workers for each gpu video_dir: bind_to: processor_cfg.video_dir help: folder for videos category_annotation: bind_to: processor_cfg.category_annotation help: a json file recording video category annotation out_dir: bind_to: processor_cfg.out_dir help: folder for storing output dataset skeleton_model: bind_to: processor_cfg.estimation_cfg.model_cfg skeleton_checkpoint: bind_to: processor_cfg.estimation_cfg.checkpoint_file detection_model: bind_to: processor_cfg.detection_cfg.model_cfg detection_checkpoint: bind_to: processor_cfg.detection_cfg.checkpoint_file
Why the loss not decreasing? How to get correct configuration parameters for building custom dataset?
Me too.
same quesion...
Why the loss not decreasing? How to get correct configuration parameters for building custom dataset?
hello ~ i have met the same probelm,and the loss does not decrease....have you sloved the problem? i am looking forward to your reply ,many thx!!!
Why the loss not decreasing? How to get correct configuration parameters for building custom dataset?
hello ~ i have met the same probelm,and the loss does not decrease....have you sloved the problem? i am looking forward to your reply ,many thx!!!
Hey, No!
Why the loss not decreasing? How to get correct configuration parameters for building custom dataset?
hello ~ i have met the same probelm,and the loss does not decrease....have you sloved the problem? i am looking forward to your reply ,many thx!!!
Hey, No!
same sad...
INFO:mmcv.runner.runner:workflow: [('train', 5), ('val', 1)], max: 65 epochs INFO:mmcv.runner.runner:Epoch [1][100/125] lr: 0.10000, eta: 0:16:44, time: 0.125, data_time: 0.015, memory: 858, loss: 2.3419 INFO:mmcv.runner.runner:Epoch [2][100/125] lr: 0.10000, eta: 0:10:46, time: 0.059, data_time: 0.014, memory: 858, loss: 2.3402 INFO:mmcv.runner.runner:Epoch [3][100/125] lr: 0.10000, eta: 0:08:59, time: 0.059, data_time: 0.014, memory: 858, loss: 2.3420 INFO:mmcv.runner.runner:Epoch [4][100/125] lr: 0.10000, eta: 0:08:07, time: 0.060, data_time: 0.014, memory: 858, loss: 2.3415 INFO:mmcv.runner.runner:Epoch [5][100/125] lr: 0.10000, eta: 0:07:34, time: 0.060, data_time: 0.015, memory: 858, loss: 2.3466 INFO:mmcv.runner.runner:Epoch(train) [5][6] loss: 2.3335, top1: 0.1458, top5: 0.5312 INFO:mmcv.runner.runner:Epoch [6][100/125] lr: 0.10000, eta: 0:07:09, time: 0.058, data_time: 0.014, memory: 859, loss: 2.3459 INFO:mmcv.runner.runner:Epoch [7][100/125] lr: 0.10000, eta: 0:06:50, time: 0.058, data_time: 0.014, memory: 859, loss: 2.3443 INFO:mmcv.runner.runner:Epoch [8][100/125] lr: 0.10000, eta: 0:06:35, time: 0.060, data_time: 0.016, memory: 859, loss: 2.3476 INFO:mmcv.runner.runner:Epoch [9][100/125] lr: 0.10000, eta: 0:06:22, time: 0.060, data_time: 0.016, memory: 859, loss: 2.3418 INFO:mmcv.runner.runner:Epoch [10][100/125] lr: 0.10000, eta: 0:06:10, time: 0.060, data_time: 0.016, memory: 859, loss: 2.3433 INFO:mmcv.runner.runner:Epoch(train) [10][6] loss: 2.3405, top1: 0.1250, top5: 0.5104 INFO:mmcv.runner.runner:Epoch [11][100/125] lr: 0.10000, eta: 0:05:59, time: 0.059, data_time: 0.015, memory: 859, loss: 2.3361 INFO:mmcv.runner.runner:Epoch [12][100/125] lr: 0.10000, eta: 0:05:50, time: 0.059, data_time: 0.016, memory: 859, loss: 2.3499 INFO:mmcv.runner.runner:Epoch [13][100/125] lr: 0.10000, eta: 0:05:40, time: 0.058, data_time: 0.016, memory: 859, loss: 2.3451 INFO:mmcv.runner.runner:Epoch [14][100/125] lr: 0.10000, eta: 0:05:31, time: 0.059, data_time: 0.014, memory: 859, loss: 2.3369 INFO:mmcv.runner.runner:Epoch [15][100/125] lr: 0.10000, eta: 0:05:23, time: 0.059, data_time: 0.016, memory: 859, loss: 2.3392 INFO:mmcv.runner.runner:Epoch(train) [15][6] loss: 2.3434, top1: 0.1250, top5: 0.4792 INFO:mmcv.runner.runner:Epoch [16][100/125] lr: 0.10000, eta: 0:05:15, time: 0.059, data_time: 0.017, memory: 859, loss: 2.3394 INFO:mmcv.runner.runner:Epoch [17][100/125] lr: 0.10000, eta: 0:05:07, time: 0.059, data_time: 0.015, memory: 859, loss: 2.3448 INFO:mmcv.runner.runner:Epoch [18][100/125] lr: 0.10000, eta: 0:04:59, time: 0.059, data_time: 0.017, memory: 859, loss: 2.3451 INFO:mmcv.runner.runner:Epoch [19][100/125] lr: 0.10000, eta: 0:04:52, time: 0.060, data_time: 0.016, memory: 859, loss: 2.3315 INFO:mmcv.runner.runner:Epoch [20][100/125] lr: 0.10000, eta: 0:04:45, time: 0.060, data_time: 0.017, memory: 859, loss: 2.3446 INFO:mmcv.runner.runner:Epoch(train) [20][6] loss: 2.3341, top1: 0.1250, top5: 0.5312 INFO:mmcv.runner.runner:Epoch [21][100/125] lr: 0.01000, eta: 0:04:37, time: 0.059, data_time: 0.016, memory: 859, loss: 2.3390 INFO:mmcv.runner.runner:Epoch [22][100/125] lr: 0.01000, eta: 0:04:30, time: 0.060, data_time: 0.015, memory: 859, loss: 2.3439 INFO:mmcv.runner.runner:Epoch [23][100/125] lr: 0.01000, eta: 0:04:23, time: 0.058, data_time: 0.014, memory: 859, loss: 2.3418 INFO:mmcv.runner.runner:Epoch [24][100/125] lr: 0.01000, eta: 0:04:17, time: 0.059, data_time: 0.015, memory: 859, loss: 2.3454 INFO:mmcv.runner.runner:Epoch [25][100/125] lr: 0.01000, eta: 0:04:10, time: 0.059, data_time: 0.015, memory: 859, loss: 2.3470 INFO:mmcv.runner.runner:Epoch(train) [25][6] loss: 2.3479, top1: 0.1250, top5: 0.5000 INFO:mmcv.runner.runner:Epoch [26][100/125] lr: 0.01000, eta: 0:04:03, time: 0.060, data_time: 0.015, memory: 859, loss: 2.3462 INFO:mmcv.runner.runner:Epoch [27][100/125] lr: 0.01000, eta: 0:03:57, time: 0.059, data_time: 0.016, memory: 859, loss: 2.3369 INFO:mmcv.runner.runner:Epoch [28][100/125] lr: 0.01000, eta: 0:03:50, time: 0.058, data_time: 0.016, memory: 859, loss: 2.3454 INFO:mmcv.runner.runner:Epoch [29][100/125] lr: 0.01000, eta: 0:03:43, time: 0.060, data_time: 0.015, memory: 859, loss: 2.3453 INFO:mmcv.runner.runner:Epoch [30][100/125] lr: 0.01000, eta: 0:03:37, time: 0.059, data_time: 0.016, memory: 859, loss: 2.3449 INFO:mmcv.runner.runner:Epoch(train) [30][6] loss: 2.3513, top1: 0.1250, top5: 0.4896 INFO:mmcv.runner.runner:Epoch [31][100/125] lr: 0.00100, eta: 0:03:30, time: 0.059, data_time: 0.017, memory: 859, loss: 2.3327 INFO:mmcv.runner.runner:Epoch [32][100/125] lr: 0.00100, eta: 0:03:24, time: 0.059, data_time: 0.017, memory: 859, loss: 2.3501 INFO:mmcv.runner.runner:Epoch [33][100/125] lr: 0.00100, eta: 0:03:18, time: 0.060, data_time: 0.015, memory: 859, loss: 2.3438 INFO:mmcv.runner.runner:Epoch [34][100/125] lr: 0.00100, eta: 0:03:11, time: 0.058, data_time: 0.015, memory: 859, loss: 2.3404 INFO:mmcv.runner.runner:Epoch [35][100/125] lr: 0.00100, eta: 0:03:05, time: 0.059, data_time: 0.018, memory: 859, loss: 2.3446 INFO:mmcv.runner.runner:Epoch(train) [35][6] loss: 2.3447, top1: 0.1458, top5: 0.5521 INFO:mmcv.runner.runner:Epoch [36][100/125] lr: 0.00100, eta: 0:02:58, time: 0.059, data_time: 0.014, memory: 859, loss: 2.3453 INFO:mmcv.runner.runner:Epoch [37][100/125] lr: 0.00100, eta: 0:02:52, time: 0.060, data_time: 0.016, memory: 859, loss: 2.3467 INFO:mmcv.runner.runner:Epoch [38][100/125] lr: 0.00100, eta: 0:02:46, time: 0.059, data_time: 0.014, memory: 859, loss: 2.3416 INFO:mmcv.runner.runner:Epoch [39][100/125] lr: 0.00100, eta: 0:02:40, time: 0.060, data_time: 0.017, memory: 859, loss: 2.3470 INFO:mmcv.runner.runner:Epoch [40][100/125] lr: 0.00100, eta: 0:02:33, time: 0.059, data_time: 0.018, memory: 859, loss: 2.3481 INFO:mmcv.runner.runner:Epoch(train) [40][6] loss: 2.3345, top1: 0.1458, top5: 0.5417 INFO:mmcv.runner.runner:Epoch [41][100/125] lr: 0.00010, eta: 0:02:27, time: 0.060, data_time: 0.015, memory: 859, loss: 2.3414 INFO:mmcv.runner.runner:Epoch [42][100/125] lr: 0.00010, eta: 0:02:21, time: 0.060, data_time: 0.018, memory: 859, loss: 2.3456 INFO:mmcv.runner.runner:Epoch [43][100/125] lr: 0.00010, eta: 0:02:15, time: 0.059, data_time: 0.015, memory: 859, loss: 2.3401 INFO:mmcv.runner.runner:Epoch [44][100/125] lr: 0.00010, eta: 0:02:09, time: 0.059, data_time: 0.016, memory: 859, loss: 2.3525 INFO:mmcv.runner.runner:Epoch [45][100/125] lr: 0.00010, eta: 0:02:03, time: 0.059, data_time: 0.015, memory: 859, loss: 2.3436 INFO:mmcv.runner.runner:Epoch(train) [45][6] loss: 2.3390, top1: 0.1042, top5: 0.5208 INFO:mmcv.runner.runner:Epoch [46][100/125] lr: 0.00010, eta: 0:01:56, time: 0.059, data_time: 0.015, memory: 859, loss: 2.3454 INFO:mmcv.runner.runner:Epoch [47][100/125] lr: 0.00010, eta: 0:01:50, time: 0.060, data_time: 0.016, memory: 859, loss: 2.3395 INFO:mmcv.runner.runner:Epoch [48][100/125] lr: 0.00010, eta: 0:01:44, time: 0.059, data_time: 0.015, memory: 859, loss: 2.3419 INFO:mmcv.runner.runner:Epoch [49][100/125] lr: 0.00010, eta: 0:01:38, time: 0.059, data_time: 0.017, memory: 859, loss: 2.3367 INFO:mmcv.runner.runner:Epoch [50][100/125] lr: 0.00010, eta: 0:01:32, time: 0.060, data_time: 0.016, memory: 859, loss: 2.3467 INFO:mmcv.runner.runner:Epoch(train) [50][6] loss: 2.3346, top1: 0.1354, top5: 0.5000 INFO:mmcv.runner.runner:Epoch [51][100/125] lr: 0.00001, eta: 0:01:26, time: 0.060, data_time: 0.016, memory: 859, loss: 2.3390 INFO:mmcv.runner.runner:Epoch [52][100/125] lr: 0.00001, eta: 0:01:20, time: 0.060, data_time: 0.016, memory: 859, loss: 2.3441 INFO:mmcv.runner.runner:Epoch [53][100/125] lr: 0.00001, eta: 0:01:14, time: 0.061, data_time: 0.018, memory: 859, loss: 2.3491 INFO:mmcv.runner.runner:Epoch [54][100/125] lr: 0.00001, eta: 0:01:08, time: 0.061, data_time: 0.016, memory: 859, loss: 2.3433 INFO:mmcv.runner.runner:Epoch [55][100/125] lr: 0.00001, eta: 0:01:01, time: 0.061, data_time: 0.017, memory: 859, loss: 2.3393 INFO:mmcv.runner.runner:Epoch(train) [55][6] loss: 2.3434, top1: 0.1458, top5: 0.5208 INFO:mmcv.runner.runner:Epoch [56][100/125] lr: 0.00001, eta: 0:00:55, time: 0.062, data_time: 0.017, memory: 859, loss: 2.3490 INFO:mmcv.runner.runner:Epoch [57][100/125] lr: 0.00001, eta: 0:00:49, time: 0.061, data_time: 0.014, memory: 859, loss: 2.3373 INFO:mmcv.runner.runner:Epoch [58][100/125] lr: 0.00001, eta: 0:00:43, time: 0.062, data_time: 0.016, memory: 859, loss: 2.3434 INFO:mmcv.runner.runner:Epoch [59][100/125] lr: 0.00001, eta: 0:00:37, time: 0.063, data_time: 0.015, memory: 859, loss: 2.3424 INFO:mmcv.runner.runner:Epoch [60][100/125] lr: 0.00001, eta: 0:00:31, time: 0.063, data_time: 0.015, memory: 859, loss: 2.3543 INFO:mmcv.runner.runner:Epoch(train) [60][6] loss: 2.3233, top1: 0.1458, top5: 0.5312 INFO:mmcv.runner.runner:Epoch [61][100/125] lr: 0.00001, eta: 0:00:25, time: 0.063, data_time: 0.015, memory: 859, loss: 2.3460 INFO:mmcv.runner.runner:Epoch [62][100/125] lr: 0.00001, eta: 0:00:19, time: 0.062, data_time: 0.016, memory: 859, loss: 2.3449 INFO:mmcv.runner.runner:Epoch [63][100/125] lr: 0.00001, eta: 0:00:13, time: 0.063, data_time: 0.014, memory: 859, loss: 2.3469 INFO:mmcv.runner.runner:Epoch [64][100/125] lr: 0.00001, eta: 0:00:07, time: 0.064, data_time: 0.016, memory: 859, loss: 2.3478 INFO:mmcv.runner.runner:Epoch [65][100/125] lr: 0.00001, eta: 0:00:01, time: 0.064, data_time: 0.018, memory: 859, loss: 2.3395 INFO:mmcv.runner.runner:Epoch(train) [65][6] loss: 2.3542, top1: 0.1250, top5: 0.5104
This is my training log, 10 categories, 10 samples per category, is this training correct?
i found that the core training phrase is done in the mmcv module(in my machine,is at /xxxxxxx/miniconda3/lib/python3.7/site-packages/mmcv-0.4.3-py3.7-linux-x86_64.egg/mmcv/runner/runner.py),
`def train(self, data_loader, **kwargs): self.model.train() self.mode = 'train' self.data_loader = data_loader self._max_iters = self._max_epochs * len(data_loader) self.call_hook('before_train_epoch') for i, data_batch in enumerate(data_loader): self._inner_iter = i self.call_hook('before_train_iter') outputs = self.batch_processor( self.model, data_batch, train_mode=True, **kwargs) if not isinstance(outputs, dict): raise TypeError('batch_processor() must return a dict') if 'log_vars' in outputs: self.log_buffer.update(outputs['log_vars'], outputs['num_samples']) self.outputs = outputs
self.optimizer.zero_grad()
self.outputs['loss'].backward()
self.optimizer.step()
self.call_hook('after_train_iter')
self._iter += 1
self.call_hook('after_train_epoch')
self._epoch += 1`
the loss backward opeation is done by the hook function , /share/jiawenhao/miniconda3/lib/python3.7/site-packages/mmcv-0.4.3-py3.7-linux-x86_64.egg/mmcv/runner/hooks/optimizer.py
` def after_train_iter(self, runner):
runner.optimizer.zero_grad()
runner.outputs['loss'].backward()
if self.grad_clip is not None:
self.clip_grads(runner.model.parameters())
runner.optimizer.step()
`
did not know why the function does not run actually.
**so manually add these operations in the runn.py ,
then ,the loss could decrease ...**
`self.optimizer.zero_grad()
self.outputs['loss'].backward()
self.optimizer.step()
`
i found that the core training phrase is done in the mmcv module(in my machine,is at /xxxxxxx/miniconda3/lib/python3.7/site-packages/mmcv-0.4.3-py3.7-linux-x86_64.egg/mmcv/runner/runner.py),
`def train(self, data_loader, **kwargs): self.model.train() self.mode = 'train' self.data_loader = data_loader self._max_iters = self._max_epochs * len(data_loader) self.call_hook('before_train_epoch') for i, data_batch in enumerate(data_loader): self._inner_iter = i self.call_hook('before_train_iter') outputs = self.batch_processor( self.model, data_batch, train_mode=True, **kwargs) if not isinstance(outputs, dict): raise TypeError('batch_processor() must return a dict') if 'log_vars' in outputs: self.log_buffer.update(outputs['log_vars'], outputs['num_samples']) self.outputs = outputs
self.optimizer.zero_grad() self.outputs['loss'].backward() self.optimizer.step() self.call_hook('after_train_iter') self._iter += 1 self.call_hook('after_train_epoch') self._epoch += 1`
the loss backward opeation is done by the hook function , /share/jiawenhao/miniconda3/lib/python3.7/site-packages/mmcv-0.4.3-py3.7-linux-x86_64.egg/mmcv/runner/hooks/optimizer.py
` def after_train_iter(self, runner):
runner.optimizer.zero_grad() runner.outputs['loss'].backward() if self.grad_clip is not None: self.clip_grads(runner.model.parameters()) runner.optimizer.step() `
did not know why the function does not run actually.
**so manually add these operations in the runn.py ,
then ,the loss could decrease ...**
`self.optimizer.zero_grad()
self.outputs['loss'].backward() self.optimizer.step()
`
Thanks, I modified the code according to your suggestion, after training is completely correct
Finally, I modified the training_hooks configuration of the train.yaml file, and the changes are as follows:
training_hooks:
lr_config:
policy: 'step'
step: [20, 30, 40, 50]
log_config:
interval: 100
hooks:
- type: TextLoggerHook
checkpoint_config:
interval: 5
optimizer_config:
grad_clip:
Finally, I modified the training_hooks configuration of the train.yaml file, and the changes are as follows:
training_hooks: lr_config: policy: 'step' step: [20, 30, 40, 50] log_config: interval: 100 hooks: - type: TextLoggerHook checkpoint_config: interval: 5 optimizer_config: grad_clip:
Hey, Can you share you complete train.yaml and final value of loss and training and test accuracy?
Finally, I modified the training_hooks configuration of the train.yaml file, and the changes are as follows:
training_hooks: lr_config: policy: 'step' step: [20, 30, 40, 50] log_config: interval: 100 hooks: - type: TextLoggerHook checkpoint_config: interval: 5 optimizer_config: grad_clip:
Hey, Can you share you complete train.yaml and final value of loss and training and test accuracy?
This is my training log:
INFO:mmcv.runner.runner:workflow: [('train', 5), ('val', 1)], max: 65 epochs INFO:mmcv.runner.runner:Epoch [1][100/116] lr: 0.10000, eta: 0:17:31, time: 0.141, data_time: 0.007, memory: 456, loss: 2.2317 INFO:mmcv.runner.runner:Epoch [2][100/116] lr: 0.10000, eta: 0:12:11, time: 0.074, data_time: 0.011, memory: 456, loss: 1.7097 INFO:mmcv.runner.runner:Epoch [3][100/116] lr: 0.10000, eta: 0:10:27, time: 0.073, data_time: 0.012, memory: 456, loss: 1.6194 INFO:mmcv.runner.runner:Epoch [4][100/116] lr: 0.10000, eta: 0:09:35, time: 0.074, data_time: 0.011, memory: 456, loss: 1.5610 INFO:mmcv.runner.runner:Epoch [5][100/116] lr: 0.10000, eta: 0:09:03, time: 0.076, data_time: 0.012, memory: 456, loss: 1.4833 INFO:mmcv.runner.runner:Epoch(train) [5][5] loss: 1.4931, top1: 0.3500, top5: 1.0000 INFO:mmcv.runner.runner:Epoch [6][100/116] lr: 0.10000, eta: 0:08:37, time: 0.074, data_time: 0.012, memory: 456, loss: 1.4393 INFO:mmcv.runner.runner:Epoch [7][100/116] lr: 0.10000, eta: 0:08:17, time: 0.074, data_time: 0.011, memory: 456, loss: 1.3877 INFO:mmcv.runner.runner:Epoch [8][100/116] lr: 0.10000, eta: 0:08:00, time: 0.073, data_time: 0.010, memory: 456, loss: 1.2841 INFO:mmcv.runner.runner:Epoch [9][100/116] lr: 0.10000, eta: 0:07:45, time: 0.074, data_time: 0.011, memory: 456, loss: 1.1788 INFO:mmcv.runner.runner:Epoch [10][100/116] lr: 0.10000, eta: 0:07:32, time: 0.075, data_time: 0.012, memory: 456, loss: 1.0855 INFO:mmcv.runner.runner:Epoch(train) [10][5] loss: 1.2552, top1: 0.5125, top5: 1.0000 INFO:mmcv.runner.runner:Epoch [11][100/116] lr: 0.10000, eta: 0:07:20, time: 0.075, data_time: 0.011, memory: 456, loss: 0.8774 INFO:mmcv.runner.runner:Epoch [12][100/116] lr: 0.10000, eta: 0:07:09, time: 0.074, data_time: 0.011, memory: 456, loss: 0.5458 INFO:mmcv.runner.runner:Epoch [13][100/116] lr: 0.10000, eta: 0:06:58, time: 0.075, data_time: 0.011, memory: 456, loss: 0.3136 INFO:mmcv.runner.runner:Epoch [14][100/116] lr: 0.10000, eta: 0:06:48, time: 0.075, data_time: 0.011, memory: 456, loss: 0.2149 INFO:mmcv.runner.runner:Epoch [15][100/116] lr: 0.10000, eta: 0:06:39, time: 0.075, data_time: 0.012, memory: 456, loss: 0.1297 INFO:mmcv.runner.runner:Epoch(train) [15][5] loss: 0.1492, top1: 1.0000, top5: 1.0000 INFO:mmcv.runner.runner:Epoch [16][100/116] lr: 0.10000, eta: 0:06:29, time: 0.075, data_time: 0.011, memory: 456, loss: 0.1112 INFO:mmcv.runner.runner:Epoch [17][100/116] lr: 0.10000, eta: 0:06:20, time: 0.075, data_time: 0.012, memory: 456, loss: 0.0726 INFO:mmcv.runner.runner:Epoch [18][100/116] lr: 0.10000, eta: 0:06:11, time: 0.075, data_time: 0.011, memory: 456, loss: 0.0188 INFO:mmcv.runner.runner:Epoch [19][100/116] lr: 0.10000, eta: 0:06:03, time: 0.075, data_time: 0.010, memory: 456, loss: 0.0255 INFO:mmcv.runner.runner:Epoch [20][100/116] lr: 0.10000, eta: 0:05:54, time: 0.075, data_time: 0.011, memory: 456, loss: 0.0666 INFO:mmcv.runner.runner:Epoch(train) [20][5] loss: 0.0752, top1: 1.0000, top5: 1.0000 INFO:mmcv.runner.runner:Epoch [21][100/116] lr: 0.01000, eta: 0:05:45, time: 0.074, data_time: 0.011, memory: 456, loss: 0.0129 INFO:mmcv.runner.runner:Epoch [22][100/116] lr: 0.01000, eta: 0:05:37, time: 0.075, data_time: 0.010, memory: 456, loss: 0.0077 INFO:mmcv.runner.runner:Epoch [23][100/116] lr: 0.01000, eta: 0:05:28, time: 0.075, data_time: 0.011, memory: 456, loss: 0.0054 INFO:mmcv.runner.runner:Epoch [24][100/116] lr: 0.01000, eta: 0:05:20, time: 0.074, data_time: 0.011, memory: 456, loss: 0.0059 INFO:mmcv.runner.runner:Epoch [25][100/116] lr: 0.01000, eta: 0:05:12, time: 0.076, data_time: 0.011, memory: 456, loss: 0.0062 INFO:mmcv.runner.runner:Epoch(train) [25][5] loss: 0.0363, top1: 1.0000, top5: 1.0000 INFO:mmcv.runner.runner:Epoch [26][100/116] lr: 0.01000, eta: 0:05:03, time: 0.075, data_time: 0.011, memory: 456, loss: 0.0040 INFO:mmcv.runner.runner:Epoch [27][100/116] lr: 0.01000, eta: 0:04:55, time: 0.074, data_time: 0.011, memory: 456, loss: 0.0035 INFO:mmcv.runner.runner:Epoch [28][100/116] lr: 0.01000, eta: 0:04:47, time: 0.075, data_time: 0.012, memory: 456, loss: 0.0042 INFO:mmcv.runner.runner:Epoch [29][100/116] lr: 0.01000, eta: 0:04:39, time: 0.075, data_time: 0.011, memory: 456, loss: 0.0038 INFO:mmcv.runner.runner:Epoch [30][100/116] lr: 0.01000, eta: 0:04:31, time: 0.076, data_time: 0.011, memory: 456, loss: 0.0034 INFO:mmcv.runner.runner:Epoch(train) [30][5] loss: 0.0447, top1: 1.0000, top5: 1.0000 INFO:mmcv.runner.runner:Epoch [31][100/116] lr: 0.00100, eta: 0:04:23, time: 0.074, data_time: 0.011, memory: 456, loss: 0.0044 INFO:mmcv.runner.runner:Epoch [32][100/116] lr: 0.00100, eta: 0:04:15, time: 0.075, data_time: 0.013, memory: 456, loss: 0.0039 INFO:mmcv.runner.runner:Epoch [33][100/116] lr: 0.00100, eta: 0:04:07, time: 0.076, data_time: 0.011, memory: 456, loss: 0.0050 INFO:mmcv.runner.runner:Epoch [34][100/116] lr: 0.00100, eta: 0:03:59, time: 0.074, data_time: 0.011, memory: 456, loss: 0.0039 INFO:mmcv.runner.runner:Epoch [35][100/116] lr: 0.00100, eta: 0:03:51, time: 0.074, data_time: 0.010, memory: 456, loss: 0.0044 INFO:mmcv.runner.runner:Epoch(train) [35][5] loss: 0.0343, top1: 1.0000, top5: 1.0000 INFO:mmcv.runner.runner:Epoch [36][100/116] lr: 0.00100, eta: 0:03:43, time: 0.076, data_time: 0.011, memory: 456, loss: 0.0034 INFO:mmcv.runner.runner:Epoch [37][100/116] lr: 0.00100, eta: 0:03:36, time: 0.075, data_time: 0.010, memory: 456, loss: 0.0036 INFO:mmcv.runner.runner:Epoch [38][100/116] lr: 0.00100, eta: 0:03:28, time: 0.075, data_time: 0.012, memory: 456, loss: 0.0031 INFO:mmcv.runner.runner:Epoch [39][100/116] lr: 0.00100, eta: 0:03:20, time: 0.075, data_time: 0.011, memory: 456, loss: 0.0054 INFO:mmcv.runner.runner:Epoch [40][100/116] lr: 0.00100, eta: 0:03:12, time: 0.075, data_time: 0.011, memory: 456, loss: 0.0041 INFO:mmcv.runner.runner:Epoch(train) [40][5] loss: 0.0408, top1: 1.0000, top5: 1.0000 INFO:mmcv.runner.runner:Epoch [41][100/116] lr: 0.00010, eta: 0:03:04, time: 0.075, data_time: 0.011, memory: 456, loss: 0.0046 INFO:mmcv.runner.runner:Epoch [42][100/116] lr: 0.00010, eta: 0:02:57, time: 0.075, data_time: 0.011, memory: 456, loss: 0.0039 INFO:mmcv.runner.runner:Epoch [43][100/116] lr: 0.00010, eta: 0:02:49, time: 0.075, data_time: 0.011, memory: 456, loss: 0.0029 INFO:mmcv.runner.runner:Epoch [44][100/116] lr: 0.00010, eta: 0:02:41, time: 0.074, data_time: 0.011, memory: 456, loss: 0.0033 INFO:mmcv.runner.runner:Epoch [45][100/116] lr: 0.00010, eta: 0:02:34, time: 0.075, data_time: 0.011, memory: 456, loss: 0.0030 INFO:mmcv.runner.runner:Epoch(train) [45][5] loss: 0.0346, top1: 1.0000, top5: 1.0000 INFO:mmcv.runner.runner:Epoch [46][100/116] lr: 0.00010, eta: 0:02:26, time: 0.076, data_time: 0.011, memory: 456, loss: 0.0039 INFO:mmcv.runner.runner:Epoch [47][100/116] lr: 0.00010, eta: 0:02:18, time: 0.076, data_time: 0.011, memory: 456, loss: 0.0030 INFO:mmcv.runner.runner:Epoch [48][100/116] lr: 0.00010, eta: 0:02:10, time: 0.074, data_time: 0.011, memory: 456, loss: 0.0032 INFO:mmcv.runner.runner:Epoch [49][100/116] lr: 0.00010, eta: 0:02:03, time: 0.075, data_time: 0.011, memory: 456, loss: 0.0040 INFO:mmcv.runner.runner:Epoch [50][100/116] lr: 0.00010, eta: 0:01:55, time: 0.075, data_time: 0.011, memory: 456, loss: 0.0033 INFO:mmcv.runner.runner:Epoch(train) [50][5] loss: 0.0390, top1: 1.0000, top5: 1.0000 INFO:mmcv.runner.runner:Epoch [51][100/116] lr: 0.00001, eta: 0:01:47, time: 0.076, data_time: 0.011, memory: 456, loss: 0.0039 INFO:mmcv.runner.runner:Epoch [52][100/116] lr: 0.00001, eta: 0:01:40, time: 0.075, data_time: 0.012, memory: 456, loss: 0.0031 INFO:mmcv.runner.runner:Epoch [53][100/116] lr: 0.00001, eta: 0:01:32, time: 0.076, data_time: 0.011, memory: 456, loss: 0.0036 INFO:mmcv.runner.runner:Epoch [54][100/116] lr: 0.00001, eta: 0:01:24, time: 0.075, data_time: 0.012, memory: 456, loss: 0.0061 INFO:mmcv.runner.runner:Epoch [55][100/116] lr: 0.00001, eta: 0:01:17, time: 0.075, data_time: 0.011, memory: 456, loss: 0.0031 INFO:mmcv.runner.runner:Epoch(train) [55][5] loss: 0.0326, top1: 1.0000, top5: 1.0000 INFO:mmcv.runner.runner:Epoch [56][100/116] lr: 0.00001, eta: 0:01:09, time: 0.076, data_time: 0.010, memory: 456, loss: 0.0027 INFO:mmcv.runner.runner:Epoch [57][100/116] lr: 0.00001, eta: 0:01:02, time: 0.077, data_time: 0.011, memory: 456, loss: 0.0031 INFO:mmcv.runner.runner:Epoch [58][100/116] lr: 0.00001, eta: 0:00:54, time: 0.075, data_time: 0.011, memory: 456, loss: 0.0028 INFO:mmcv.runner.runner:Epoch [59][100/116] lr: 0.00001, eta: 0:00:46, time: 0.075, data_time: 0.011, memory: 456, loss: 0.0034 INFO:mmcv.runner.runner:Epoch [60][100/116] lr: 0.00001, eta: 0:00:39, time: 0.074, data_time: 0.011, memory: 456, loss: 0.0035 INFO:mmcv.runner.runner:Epoch(train) [60][5] loss: 0.0372, top1: 1.0000, top5: 1.0000 INFO:mmcv.runner.runner:Epoch [61][100/116] lr: 0.00001, eta: 0:00:31, time: 0.075, data_time: 0.011, memory: 456, loss: 0.0040 INFO:mmcv.runner.runner:Epoch [62][100/116] lr: 0.00001, eta: 0:00:23, time: 0.076, data_time: 0.012, memory: 456, loss: 0.0034 INFO:mmcv.runner.runner:Epoch [63][100/116] lr: 0.00001, eta: 0:00:16, time: 0.076, data_time: 0.011, memory: 456, loss: 0.0034 INFO:mmcv.runner.runner:Epoch [64][100/116] lr: 0.00001, eta: 0:00:08, time: 0.075, data_time: 0.011, memory: 456, loss: 0.0032 INFO:mmcv.runner.runner:Epoch [65][100/116] lr: 0.00001, eta: 0:00:01, time: 0.076, data_time: 0.011, memory: 456, loss: 0.0032 INFO:mmcv.runner.runner:Epoch(train) [65][5] loss: 0.0310, top1: 1.0000, top5: 1.0000
This is my test log:
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 93/93, 89.6 task/s, elapsed: 1s, ETA: 0sTop 1: 100.00% Top 5: 100.00%
Hey, Did you use the same training configuration file as example_dataset?
Hey, Did you use the same training configuration file as example_dataset?
train.yaml
argparse_cfg:
gpus:
bind_to: processor_cfg.gpus
help: number of gpus
work_dir:
bind_to: processor_cfg.work_dir
help: the dir to save logs and models
batch_size:
bind_to: processor_cfg.batch_size
resume_from:
bind_to: processor_cfg.resume_from
help: the checkpoint file to resume from
processor_cfg:
type: 'processor.recognition.train'
workers: 16
# model setting
model_cfg:
type: 'models.backbones.ST_GCN_18'
in_channels: 3
num_class: 10
edge_importance_weighting: True
graph_cfg:
layout: 'coco'
strategy: 'spatial'
loss_cfg:
type: 'torch.nn.CrossEntropyLoss'
# dataset setting
dataset_cfg:
# training set
- type: "datasets.DataPipeline"
data_source:
type: "datasets.SkeletonLoader"
data_dir: ./data/actions_as_space_time_shapes
num_track: 2
num_keypoints: 17
repeat: 20
pipeline:
- {type: "datasets.skeleton.normalize_by_resolution"}
- {type: "datasets.skeleton.mask_by_visibility"}
- {type: "datasets.skeleton.pad_zero", size: 150 }
- {type: "datasets.skeleton.random_crop", size: 150 }
- {type: "datasets.skeleton.simulate_camera_moving"}
- {type: "datasets.skeleton.transpose", order: [0, 2, 1, 3]}
- {type: "datasets.skeleton.to_tuple"}
- type: "datasets.DataPipeline"
data_source:
type: "datasets.SkeletonLoader"
data_dir: ./data/actions_as_space_time_shapes
num_track: 2
num_keypoints: 17
pipeline:
- {type: "datasets.skeleton.normalize_by_resolution"}
- {type: "datasets.skeleton.mask_by_visibility"}
- {type: "datasets.skeleton.pad_zero", size: 300 }
- {type: "datasets.skeleton.random_crop", size: 300 }
- {type: "datasets.skeleton.transpose", order: [0, 2, 1, 3]}
- {type: "datasets.skeleton.to_tuple"}
# dataloader setting
batch_size: 16
gpus: 4
# optimizer setting
optimizer_cfg:
type: 'torch.optim.SGD'
lr: 0.1
momentum: 0.9
nesterov: true
weight_decay: 0.0001
# runtime setting
workflow: [['train', 5], ['val', 1]]
work_dir: ./work_dir/recognition/st_gcn/actions_as_space_time_shapes
total_epochs: 65
training_hooks:
lr_config:
policy: 'step'
step: [20, 30, 40, 50]
log_config:
interval: 100
hooks:
- type: TextLoggerHook
checkpoint_config:
interval: 5
optimizer_config:
grad_clip:
resume_from:
load_from:
Hey, Did you use the same training configuration file as example_dataset?
train.yaml
argparse_cfg: gpus: bind_to: processor_cfg.gpus help: number of gpus work_dir: bind_to: processor_cfg.work_dir help: the dir to save logs and models batch_size: bind_to: processor_cfg.batch_size resume_from: bind_to: processor_cfg.resume_from help: the checkpoint file to resume from processor_cfg: type: 'processor.recognition.train' workers: 16 # model setting model_cfg: type: 'models.backbones.ST_GCN_18' in_channels: 3 num_class: 10 edge_importance_weighting: True graph_cfg: layout: 'coco' strategy: 'spatial' loss_cfg: type: 'torch.nn.CrossEntropyLoss' # dataset setting dataset_cfg: # training set - type: "datasets.DataPipeline" data_source: type: "datasets.SkeletonLoader" data_dir: ./data/actions_as_space_time_shapes num_track: 2 num_keypoints: 17 repeat: 20 pipeline: - {type: "datasets.skeleton.normalize_by_resolution"} - {type: "datasets.skeleton.mask_by_visibility"} - {type: "datasets.skeleton.pad_zero", size: 150 } - {type: "datasets.skeleton.random_crop", size: 150 } - {type: "datasets.skeleton.simulate_camera_moving"} - {type: "datasets.skeleton.transpose", order: [0, 2, 1, 3]} - {type: "datasets.skeleton.to_tuple"} - type: "datasets.DataPipeline" data_source: type: "datasets.SkeletonLoader" data_dir: ./data/actions_as_space_time_shapes num_track: 2 num_keypoints: 17 pipeline: - {type: "datasets.skeleton.normalize_by_resolution"} - {type: "datasets.skeleton.mask_by_visibility"} - {type: "datasets.skeleton.pad_zero", size: 300 } - {type: "datasets.skeleton.random_crop", size: 300 } - {type: "datasets.skeleton.transpose", order: [0, 2, 1, 3]} - {type: "datasets.skeleton.to_tuple"} # dataloader setting batch_size: 16 gpus: 4 # optimizer setting optimizer_cfg: type: 'torch.optim.SGD' lr: 0.1 momentum: 0.9 nesterov: true weight_decay: 0.0001 # runtime setting workflow: [['train', 5], ['val', 1]] work_dir: ./work_dir/recognition/st_gcn/actions_as_space_time_shapes total_epochs: 65 training_hooks: lr_config: policy: 'step' step: [20, 30, 40, 50] log_config: interval: 100 hooks: - type: TextLoggerHook checkpoint_config: interval: 5 optimizer_config: grad_clip: resume_from: load_from:
Thanks! I will try my training again and show my results .
Finally, I modified the training_hooks configuration of the train.yaml file, and the changes are as follows:
training_hooks: lr_config: policy: 'step' step: [20, 30, 40, 50] log_config: interval: 100 hooks: - type: TextLoggerHook checkpoint_config: interval: 5 optimizer_config: grad_clip:
great!thank you👍~so,the only diff is add the option grad_clip ?
Finally, I modified the training_hooks configuration of the train.yaml file, and the changes are as follows:
training_hooks: lr_config: policy: 'step' step: [20, 30, 40, 50] log_config: interval: 100 hooks: - type: TextLoggerHook checkpoint_config: interval: 5 optimizer_config: grad_clip:
great!thank you~so,the only diff is add the option grad_clip ?
Yes
I can't thank enough people on this thread that found the error !
i found that the core training phrase is done in the mmcv module(in my machine,is at /xxxxxxx/miniconda3/lib/python3.7/site-packages/mmcv-0.4.3-py3.7-linux-x86_64.egg/mmcv/runner/runner.py), `def train(self, data_loader, **kwargs): self.model.train() self.mode = 'train' self.data_loader = data_loader self._max_iters = self._max_epochs * len(data_loader) self.call_hook('before_train_epoch') for i, data_batch in enumerate(data_loader): self._inner_iter = i self.call_hook('before_train_iter') outputs = self.batch_processor( self.model, data_batch, train_mode=True, **kwargs) if not isinstance(outputs, dict): raise TypeError('batch_processor() must return a dict') if 'log_vars' in outputs: self.log_buffer.update(outputs['log_vars'], outputs['num_samples']) self.outputs = outputs
self.optimizer.zero_grad() self.outputs['loss'].backward() self.optimizer.step() self.call_hook('after_train_iter') self._iter += 1 self.call_hook('after_train_epoch') self._epoch += 1`
the loss backward opeation is done by the hook function , /share/jiawenhao/miniconda3/lib/python3.7/site-packages/mmcv-0.4.3-py3.7-linux-x86_64.egg/mmcv/runner/hooks/optimizer.py ` def after_train_iter(self, runner):
runner.optimizer.zero_grad() runner.outputs['loss'].backward() if self.grad_clip is not None: self.clip_grads(runner.model.parameters()) runner.optimizer.step() `
did not know why the function does not run actually. so manually add these operations in the runn.py , then ,the loss could decrease ... `self.optimizer.zero_grad()
self.outputs['loss'].backward() self.optimizer.step()
`
Thanks, I modified the code according to your suggestion, after training is completely correct
Hey,
Did anyone get this error after adding code to runner.py?
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
I can't thank enough people on this thread that found the error !
Hey,
Does it work for you?
RuntimeError the error seems like you have two times backward ? i did not met it before...
The process is still running, the loss function is decreasing, which was not the case before the following modificiation:
- Add grad_clip: under optimizer_config: in the training.yaml file
I did not change anything under /share/jiawenhao/miniconda3/lib/python3.7/site-packages/mmcv-0.4.3-py3.7-linux-x86_64.egg/mmcv/runner/hooks/optimizer.py
The process is still running, the loss function is decreasing, which was not the case before the following modificiation:
- Add grad_clip: under optimizer_config: in the training.yaml file
I did not change anything under /share/jiawenhao/miniconda3/lib/python3.7/site-packages/mmcv-0.4.3-py3.7-linux-x86_64.egg/mmcv/runner/hooks/optimizer.py
@CamilleMaurice @jiawenhao2015 Ok. Thank you.
Hey,
Anyone has idea how to get result on single video for trained model?
@rashidch Have you tried to create a configuration file similar to test.yaml ?
@rashidch Have you tried to create a configuration file similar to test.yaml ?
Yeah.
@rashidch Then you are able to get the result on a single video for a trained model through using test.yaml but you are looking for a more flexible way ?
@rashidch Then you are able to get the result on a single video for a trained model through using test.yaml but you are looking for a more flexible way ?
Right now, I only get test accuracy on test data. I did not implement single video inference yet. I want to implement it, but I was little busy.
@rashidch Then you are able to get the result on a single video for a trained model through using test.yaml but you are looking for a more flexible way ?
I want to implement single video inference where we can show frame by frame actions recognized by the system in video.
Finally, I modified the training_hooks configuration of the train.yaml file, and the changes are as follows:
training_hooks: lr_config: policy: 'step' step: [20, 30, 40, 50] log_config: interval: 100 hooks: - type: TextLoggerHook checkpoint_config: interval: 5 optimizer_config: grad_clip:
Thank you very much. It worked for me.
Out of curiosity what is grad_clip?
Finally, I modified the training_hooks configuration of the train.yaml file, and the changes are as follows:
training_hooks: lr_config: policy: 'step' step: [20, 30, 40, 50] log_config: interval: 100 hooks: - type: TextLoggerHook checkpoint_config: interval: 5 optimizer_config: grad_clip:
Thank you very much. It worked for me.
Out of curiosity what is grad_clip?
@vivek87799 Did you get your answer now? I want to know what is grad_clip, too