PaddleVideo ppTSM 训练速度会在训练过程中变的极其缓慢

paddle2.0 python3.7 GPU：V100 config file: frame, dense, presiceBN, 自定义数据

问题描述：当6卡，bathch size=24， num_workers=3时，训练16个epoch时，训练速度就会变得极其缓慢，一晚上都训不完一个epoch，当4卡，bathch size=24， num_workers=4时，训练5个epoch时，训练速度同样就会变得极其缓慢

不知什么原因

Aug 18 '21 09:08 KK-Jiang

数据格式是video还是frame？

一开始速度正常，几个epoch之后速度变慢了吗？

Aug 18 '21 09:08 huangjun12

数据格式是video还是frame？

一开始速度正常，几个epoch之后速度变慢了吗？

数据是自己的数据，格式为Frame，一开始速度正常，当速度变得极慢之后，ctrl+c中断训练后，重新resume继续训练，速度恢复正常，然后几个epoch后又变的极慢，如此往复。

Aug 18 '21 11:08 KK-Jiang

方便贴一下配置文件嘛

Aug 18 '21 12:08 huangjun12

MODEL: #MODEL field framework: "Recognizer2D" #Mandatory, indicate the type of network, associate to the 'paddlevideo/modeling/framework/' . backbone: #Mandatory, indicate the type of backbone, associate to the 'paddlevideo/modeling/backbones/' . name: "ResNetTweaksTSM" #Mandatory, The name of backbone. pretrained: "data/ResNet50_vd_ssld_v2_pretrained.pdparams" #Optional, pretrained model path. depth: 50 #Optional, the depth of backbone architecture. head: name: "ppTSMHead" #Mandatory, indicate the type of head, associate to the 'paddlevideo/modeling/heads' num_classes: 80 #101 #Optional, the number of classes to be classified. in_channels: 2048 #input channel of the extracted feature. drop_ratio: 0.5 #the ratio of dropout std: 0.01 #std value in params initialization ls_eps: 0.1

DATASET: #DATASET field batch_size: 24 #Mandatory, bacth size num_workers: 4 #Mandatory, XXX the number of subprocess on each GPU. test_batch_size: 1 #Mandatory, test bacth size train: format: "FrameDataset" #Mandatory, indicate the type of dataset, associate to the 'paddlevidel/loader/dateset' data_prefix: "" #Mandatory, train data root path file_path: "XXXX" #Mandatory, train data index file path suffix: 'img_{:05}.jpg' valid: format: "FrameDataset" #Mandatory, indicate the type of dataset, associate to the 'paddlevidel/loader/dateset' data_prefix: "" #Mandatory, valid data root path file_path: "XXX" #Mandatory, valid data index file path suffix: 'img_{:05}.jpg' test: format: "FrameDataset" #Mandatory, indicate the type of dataset, associate to the 'paddlevidel/loader/dateset' data_prefix: "" #Mandatory, valid data root path file_path: "XXXX" #Mandatory, valid data index file path suffix: 'img_{:05}.jpg'

PIPELINE: #PIPELINE field train: #Mandotary, indicate the pipeline to deal with the training data, associate to the 'paddlevideo/loader/pipelines/' decode: name: "FrameDecoder" sample: name: "Sampler" num_seg: 8 seg_len: 1 valid_mode: False dense_sample: True transform: #Mandotary, image transfrom operator - Scale: short_size: 256 - MultiScaleCrop: target_size: 256 - RandomCrop: target_size: 224 - RandomFlip: - Image2Array: - Normalization: mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225] valid: #Mandatory, indicate the pipeline to deal with the validing data. associate to the 'paddlevideo/loader/pipelines/' decode: name: "FrameDecoder" sample: name: "Sampler" num_seg: 8 seg_len: 1 valid_mode: True transform: - Scale: short_size: 256 - CenterCrop: target_size: 224 - Image2Array: - Normalization: mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225] test: decode: name: "FrameDecoder" sample: name: "Sampler" num_seg: 8 seg_len: 1 valid_mode: True dense_sample: True transform: - Scale: short_size: 256 - GroupFullResSample: crop_size: 224 - Image2Array: - Normalization: mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225]

OPTIMIZER: #OPTIMIZER field name: 'Momentum' momentum: 0.9 learning_rate: iter_step: True name: 'CustomWarmupCosineDecay' max_epoch: 80 warmup_epochs: 5 warmup_start_lr: 0.005 cosine_base_lr: 0.01 weight_decay: name: 'L2' value: 1e-4 use_nesterov: True

MIX: name: "Mixup" alpha: 0.2

PRECISEBN: preciseBN_interval: 5 # epoch interval to do preciseBN, default 1. num_iters_preciseBN: 200 # how many batches used to do preciseBN, default 200.

METRIC: name: 'CenterCropMetric'

INFERENCE: name: 'ppTSM_Inference_helper' num_seg: 8 target_size: 224

model_name: "ppTSM" log_interval: 5 #Optional, the interal of logger, default:10 epochs: 80 #Mandatory, total epoch log_level: "INFO" #Optional, the logger level. default: "INFO"

Aug 18 '21 12:08 KK-Jiang

这边用最新的PaddleVideo代码，在k400上dense默认配置下，跑PP-TSM模型，并未复现训练逐渐变慢的问题。建议先尝试以下方法：

(1) 看自定义配置，batch_size比较大，有可能导致个别op出现显存泄露，建议将bs调小(24 --> 8)试试； (2) check下，跑的PaddleVideo代码中，preciseBN是否添加no_grad，https://github.com/PaddlePaddle/PaddleVideo/blob/aed8d3ce42e065c3a307cbec6530f99a1e8466a1/paddlevideo/utils/precise_bn.py#L25

Aug 19 '21 11:08 huangjun12

为帮助其他人避坑：（1）目前已经确定是使用“paddle.distributed.launch”训练的原因，经测试，使用单卡直接运行main.py速度更快。（2）另发现一规律，使用paddle.distributed.launch训练次数越多，速度越慢，不知道什么原因。（3）尝试改用 dist.spawn ，然后各种奇怪错误，故放弃。

Aug 31 '21 02:08 KK-Jiang

现在还是会越训越慢，这么久了paddle官方怎么还没有解决

Mar 10 '23 00:03 zengwb-lx

现在还是会越训越慢，这么久了paddle官方怎么还没有解决

使用的paddle版本是？

Mar 10 '23 02:03 huangjun12

现在还是会越训越慢，这么久了paddle官方怎么还没有解决

使用的paddle版本是？

paddle-bfloat 0.1.2 paddlepaddle-gpu 2.3.0.post111 ppvideo 2.3.0 训练的是ppTSM-Resnet50

Mar 10 '23 07:03 zengwb-lx

这个版本应该OK。现在的eta计算有点问题，看eta预估时间可能不太准，看ips是越来越小吗

Mar 10 '23 07:03 huangjun12

这个版本应该OK。现在的eta计算有点问题，看eta预估时间可能不太准，看ips是越来越小吗

验证了加上preciseBN就会变越来越慢，不加就是正常的

Mar 14 '23 07:03 zengwb-lx

所以，如何进行一机多卡训练，而不出现训练越来越慢的情况？？？ python3.7 -B -m paddle.distributed.launch --gpus="0,1" --log_dir=pptsm_frames_dense main.py --validate -c path/to/config.yaml

Dec 13 '23 06:12 Bradly-s

PaddleVideo PaddleVideo copied to clipboard

ppTSM 训练速度会在训练过程中变的极其缓慢

PaddleVideo
PaddleVideo copied to clipboard