PaddleVideo
PaddleVideo copied to clipboard
ppTSM 训练速度会在训练过程中变的极其缓慢
paddle2.0 python3.7 GPU:V100 config file: frame, dense, presiceBN, 自定义数据
问题描述:当6卡,bathch size=24, num_workers=3时,训练16个epoch时,训练速度就会变得极其缓慢,一晚上都训不完一个epoch,当4卡,bathch size=24, num_workers=4时,训练5个epoch时,训练速度同样就会变得极其缓慢
不知什么原因
数据格式是video还是frame?
一开始速度正常,几个epoch之后速度变慢了吗?
数据格式是video还是frame?
一开始速度正常,几个epoch之后速度变慢了吗?
数据是自己的数据,格式为Frame,一开始速度正常,当速度变得极慢之后,ctrl+c中断训练后,重新resume继续训练,速度恢复正常,然后几个epoch后又变的极慢,如此往复。
方便贴一下配置文件嘛
MODEL: #MODEL field framework: "Recognizer2D" #Mandatory, indicate the type of network, associate to the 'paddlevideo/modeling/framework/' . backbone: #Mandatory, indicate the type of backbone, associate to the 'paddlevideo/modeling/backbones/' . name: "ResNetTweaksTSM" #Mandatory, The name of backbone. pretrained: "data/ResNet50_vd_ssld_v2_pretrained.pdparams" #Optional, pretrained model path. depth: 50 #Optional, the depth of backbone architecture. head: name: "ppTSMHead" #Mandatory, indicate the type of head, associate to the 'paddlevideo/modeling/heads' num_classes: 80 #101 #Optional, the number of classes to be classified. in_channels: 2048 #input channel of the extracted feature. drop_ratio: 0.5 #the ratio of dropout std: 0.01 #std value in params initialization ls_eps: 0.1
DATASET: #DATASET field batch_size: 24 #Mandatory, bacth size num_workers: 4 #Mandatory, XXX the number of subprocess on each GPU. test_batch_size: 1 #Mandatory, test bacth size train: format: "FrameDataset" #Mandatory, indicate the type of dataset, associate to the 'paddlevidel/loader/dateset' data_prefix: "" #Mandatory, train data root path file_path: "XXXX" #Mandatory, train data index file path suffix: 'img_{:05}.jpg' valid: format: "FrameDataset" #Mandatory, indicate the type of dataset, associate to the 'paddlevidel/loader/dateset' data_prefix: "" #Mandatory, valid data root path file_path: "XXX" #Mandatory, valid data index file path suffix: 'img_{:05}.jpg' test: format: "FrameDataset" #Mandatory, indicate the type of dataset, associate to the 'paddlevidel/loader/dateset' data_prefix: "" #Mandatory, valid data root path file_path: "XXXX" #Mandatory, valid data index file path suffix: 'img_{:05}.jpg'
PIPELINE: #PIPELINE field train: #Mandotary, indicate the pipeline to deal with the training data, associate to the 'paddlevideo/loader/pipelines/' decode: name: "FrameDecoder" sample: name: "Sampler" num_seg: 8 seg_len: 1 valid_mode: False dense_sample: True transform: #Mandotary, image transfrom operator - Scale: short_size: 256 - MultiScaleCrop: target_size: 256 - RandomCrop: target_size: 224 - RandomFlip: - Image2Array: - Normalization: mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225] valid: #Mandatory, indicate the pipeline to deal with the validing data. associate to the 'paddlevideo/loader/pipelines/' decode: name: "FrameDecoder" sample: name: "Sampler" num_seg: 8 seg_len: 1 valid_mode: True transform: - Scale: short_size: 256 - CenterCrop: target_size: 224 - Image2Array: - Normalization: mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225] test: decode: name: "FrameDecoder" sample: name: "Sampler" num_seg: 8 seg_len: 1 valid_mode: True dense_sample: True transform: - Scale: short_size: 256 - GroupFullResSample: crop_size: 224 - Image2Array: - Normalization: mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225]
OPTIMIZER: #OPTIMIZER field name: 'Momentum' momentum: 0.9 learning_rate: iter_step: True name: 'CustomWarmupCosineDecay' max_epoch: 80 warmup_epochs: 5 warmup_start_lr: 0.005 cosine_base_lr: 0.01 weight_decay: name: 'L2' value: 1e-4 use_nesterov: True
MIX: name: "Mixup" alpha: 0.2
PRECISEBN: preciseBN_interval: 5 # epoch interval to do preciseBN, default 1. num_iters_preciseBN: 200 # how many batches used to do preciseBN, default 200.
METRIC: name: 'CenterCropMetric'
INFERENCE: name: 'ppTSM_Inference_helper' num_seg: 8 target_size: 224
model_name: "ppTSM" log_interval: 5 #Optional, the interal of logger, default:10 epochs: 80 #Mandatory, total epoch log_level: "INFO" #Optional, the logger level. default: "INFO"
这边用最新的PaddleVideo代码,在k400上dense默认配置下,跑PP-TSM模型,并未复现训练逐渐变慢的问题。 建议先尝试以下方法:
(1) 看自定义配置,batch_size比较大,有可能导致个别op出现显存泄露,建议将bs调小(24 --> 8)试试; (2) check下,跑的PaddleVideo代码中,preciseBN是否添加no_grad,https://github.com/PaddlePaddle/PaddleVideo/blob/aed8d3ce42e065c3a307cbec6530f99a1e8466a1/paddlevideo/utils/precise_bn.py#L25
为帮助其他人避坑: (1)目前已经确定是使用“paddle.distributed.launch”训练的原因,经测试,使用单卡直接运行main.py速度更快。 (2)另发现一规律,使用paddle.distributed.launch训练次数越多,速度越慢,不知道什么原因。 (3)尝试改用 dist.spawn ,然后各种奇怪错误,故放弃。
现在还是会越训越慢,这么久了paddle官方怎么还没有解决
现在还是会越训越慢,这么久了paddle官方怎么还没有解决
使用的paddle版本是?
现在还是会越训越慢,这么久了paddle官方怎么还没有解决
使用的paddle版本是?
paddle-bfloat 0.1.2 paddlepaddle-gpu 2.3.0.post111 ppvideo 2.3.0 训练的是ppTSM-Resnet50
这个版本应该OK。现在的eta计算有点问题,看eta预估时间可能不太准,看ips是越来越小吗
这个版本应该OK。现在的eta计算有点问题,看eta预估时间可能不太准,看ips是越来越小吗
验证了加上preciseBN就会变越来越慢,不加就是正常的
所以,如何进行一机多卡训练,而不出现训练越来越慢的情况??? python3.7 -B -m paddle.distributed.launch --gpus="0,1" --log_dir=pptsm_frames_dense main.py --validate -c path/to/config.yaml