How to use custom dataset for action recognition task?
Branch
main branch (1.x version, such as v1.0.0, or dev-1.x branch)
Prerequisite
- [X] I have searched Issues and Discussions but cannot get the expected help.
- [X] I have read the documentation but cannot get the expected help.
- [X] The bug has not been fixed in the latest version.
Environment
System environment: sys.platform: linux Python: 3.8.17 (default, Jul 5 2023, 21:04:15) [GCC 11.2.0] CUDA available: True numpy_random_seed: 738065409 GPU 0: NVIDIA GeForce RTX 3090 CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 11.0, V11.0.221 GCC: gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 PyTorch: 1.7.0 PyTorch compiling details: PyTorch built with:
-
GCC 7.3
-
C++ Version: 201402
-
Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
-
Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
-
OpenMP 201511 (a.k.a. OpenMP 4.5)
-
NNPACK is enabled
-
CPU capability usage: AVX2
-
CUDA Runtime 11.0
-
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_37,code=compute_37
-
CuDNN 8.0.3
-
Magma 2.5.2
-
Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,
TorchVision: 0.8.1 OpenCV: 4.8.0 MMEngine: 0.8.3
Runtime environment: cudnn_benchmark: False mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0} dist_cfg: {'backend': 'nccl'} seed: 738065409 diff_rank_seed: False deterministic: False Distributed launcher: none Distributed training: False GPU number: 1
Describe the bug
Traceback (most recent call last):
File "tools/train.py", line 135, in
Reproduces the problem - code sample
The following is the complete config code: ann_file_train = '/mmaction2/classroomactionvideo/train.txt' ann_file_val = '/mmaction2/classroomactionvideo/val.txt' auto_scale_lr = dict(base_batch_size=256, enable=False) data_root = '/mmaction2/classroomactionvideo/train/' data_root_val = '/mmaction2/classroomactionvideo/val/' dataset_type = 'VideoDataset' default_hooks = dict( checkpoint=dict(interval=1, save_best='auto', type='CheckpointHook'), logger=dict(ignore_last=False, interval=20, type='LoggerHook'), param_scheduler=dict(type='ParamSchedulerHook'), runtime_info=dict(type='RuntimeInfoHook'), sampler_seed=dict(type='DistSamplerSeedHook'), sync_buffers=dict(type='SyncBuffersHook'), timer=dict(type='IterTimerHook')) default_scope = 'mmaction' env_cfg = dict( cudnn_benchmark=False, dist_cfg=dict(backend='nccl'), mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0)) file_client_args = dict(io_backend='disk') launcher = 'none' load_from = None log_level = 'INFO' log_processor = dict(by_epoch=True, type='LogProcessor', window_size=20) model = dict( backbone=dict( depth=50, norm_eval=False, pretrained='https://download.pytorch.org/models/resnet50-11ad3fa6.pth', type='ResNet'), cls_head=dict( average_clips='prob', consensus=dict(dim=1, type='AvgConsensus'), dropout_ratio=0.4, in_channels=2048, init_std=0.01, num_classes=7, spatial_type='avg', type='TSNHead'), data_preprocessor=dict( format_shape='NCHW', mean=[ 123.675, 116.28, 103.53, ], std=[ 58.395, 57.12, 57.375, ], type='ActionDataPreprocessor'), test_cfg=None, train_cfg=None, type='Recognizer2D') optim_wrapper = dict( clip_grad=dict(max_norm=40, norm_type=2), optimizer=dict(lr=0.005, momentum=0.9, type='SGD', weight_decay=0.0001)) param_scheduler = [ dict( begin=0, by_epoch=True, end=50, gamma=0.1, milestones=[ 20, 40, ], type='MultiStepLR'), ] randomness = dict(deterministic=False, diff_rank_seed=False, seed=None) resume = False test_cfg = dict(type='TestLoop') test_dataloader = dict( batch_size=1, dataset=dict( ann_file='/mmaction2/classroomactionvideo/val.txt', data_prefix=dict(video='/mmaction2/classroomactionvideo/val/'), pipeline=[ dict(io_backend='disk', type='DecordInit'), dict( clip_len=1, frame_interval=1, num_clips=25, test_mode=True, type='SampleFrames'), dict(type='DecordDecode'), dict(scale=( -1, 256, ), type='Resize'), dict(crop_size=224, type='TenCrop'), dict(input_format='NCHW', type='FormatShape'), dict(type='PackActionInputs'), ], test_mode=True, type='VideoDataset'), num_workers=2, persistent_workers=True, sampler=dict(shuffle=False, type='DefaultSampler')) test_evaluator = dict(type='AccMetric') test_pipeline = [ dict(io_backend='disk', type='DecordInit'), dict( clip_len=1, frame_interval=1, num_clips=25, test_mode=True, type='SampleFrames'), dict(type='DecordDecode'), dict(scale=( -1, 256, ), type='Resize'), dict(crop_size=224, type='TenCrop'), dict(input_format='NCHW', type='FormatShape'), dict(type='PackActionInputs'), ] train_cfg = dict( max_epochs=50, type='EpochBasedTrainLoop', val_begin=1, val_interval=1) train_dataloader = dict( batch_size=32, dataset=dict( ann_file='/mmaction2/classroomactionvideo/train.txt', data_prefix=dict(video='/mmaction2/classroomactionvideo/train/'), pipeline=[ dict(io_backend='disk', type='DecordInit'), dict( clip_len=1, frame_interval=1, num_clips=3, type='SampleFrames'), dict(type='DecordDecode'), dict(scale=( -1, 256, ), type='Resize'), dict( input_size=224, max_wh_scale_gap=1, random_crop=False, scales=( 1, 0.875, 0.75, 0.66, ), type='MultiScaleCrop'), dict(keep_ratio=False, scale=( 224, 224, ), type='Resize'), dict(flip_ratio=0.5, type='Flip'), dict(input_format='NCHW', type='FormatShape'), dict(type='PackActionInputs'), ], type='VideoDataset'), num_workers=2, persistent_workers=True, sampler=dict(shuffle=True, type='DefaultSampler')) train_pipeline = [ dict(io_backend='disk', type='DecordInit'), dict(clip_len=1, frame_interval=1, num_clips=3, type='SampleFrames'), dict(type='DecordDecode'), dict(scale=( -1, 256, ), type='Resize'), dict( input_size=224, max_wh_scale_gap=1, random_crop=False, scales=( 1, 0.875, 0.75, 0.66, ), type='MultiScaleCrop'), dict(keep_ratio=False, scale=( 224, 224, ), type='Resize'), dict(flip_ratio=0.5, type='Flip'), dict(input_format='NCHW', type='FormatShape'), dict(type='PackActionInputs'), ] val_cfg = dict(type='ValLoop') val_dataloader = dict( batch_size=32, dataset=dict( ann_file='/mmaction2/classroomactionvideo/val.txt', data_prefix=dict(video='/mmaction2/classroomactionvideo/val/'), pipeline=[ dict(io_backend='disk', type='DecordInit'), dict( clip_len=1, frame_interval=1, num_clips=3, test_mode=True, type='SampleFrames'), dict(type='DecordDecode'), dict(scale=( -1, 256, ), type='Resize'), dict(crop_size=224, type='CenterCrop'), dict(input_format='NCHW', type='FormatShape'), dict(type='PackActionInputs'), ], test_mode=True, type='VideoDataset'), num_workers=2, persistent_workers=True, sampler=dict(shuffle=False, type='DefaultSampler')) val_evaluator = dict(type='AccMetric') val_pipeline = [ dict(io_backend='disk', type='DecordInit'), dict( clip_len=1, frame_interval=1, num_clips=3, test_mode=True, type='SampleFrames'), dict(type='DecordDecode'), dict(scale=( -1, 256, ), type='Resize'), dict(crop_size=224, type='CenterCrop'), dict(input_format='NCHW', type='FormatShape'), dict(type='PackActionInputs'), ] vis_backends = [ dict(type='LocalVisBackend'), ] visualizer = dict( type='ActionVisualizer', vis_backends=[ dict(type='LocalVisBackend'), ]) work_dir = './work_dirs/tsn_imagenet-pretrained-r50_8xb32-1x1x3-100e_classroom'
Reproduces the problem - command or script
python tools/train.py checkpiont/tsn_imagenet-pretrained-r50_8xb32-1x1x3-100e_classroom.py
Reproduces the problem - error message
Traceback (most recent call last):
File "tools/train.py", line 135, in
Additional information
dataset path --classroomactionvideo --train --val --train.txt --val.txt
train.txt and val.txt typography here is similar to the multi-line drinking (1).mp4 1.I am a beginner and am now learning the model fine-tuning of the user guide in MMAction2, in which I want to train action recognition with my own dataset videodataset, and report an error after running ValueError: too many values to unpack (expected 2).I would like to ask what is wrong and How to correct thanks.(我是一个入门小白,现在在学习MMAction2中的用户指南的模型微调,其中我用自己的数据集videodataset想训练一下动作识别,运行之后报错ValueError: too many values to unpack (expected 2),请教一下哪里错了,如何改正谢谢)
The error message indicates that there are too many values for each annotation line. Please check and make sure that your annotation file has the same format as follows:
some/path/000.mp4 1
some/path/001.mp4 1
...
you can refer to the doc for details
@cir7 Is it similar to classroomvideodataset/train/001.mp4 1?,after I changed it to this, I still reported the original error
could you paste part of your annotation here?
Okay, please see where I went wrong. train drinking (31).mp4 1 drinking (32).mp4 1 drinking (33).mp4 1 drinking (34).mp4 1 ....... lecture (31).mp4 2 lecture (32).mp4 2 lecture (33).mp4 2 lecture (34).mp4 2 .....
train.txt classroomvideo/train/drinking (31).mp4 1 classroomvideo/train/drinking (32).mp4 1 classroomvideo/train/drinking (33).mp4 1 classroomvideo/train/drinking (34).mp4 1 ....... classroomvideo/train/lecture (31).mp4 2 classroomvideo/train/lecture (32).mp4 2 classroomvideo/train/lecture (33).mp4 2 classroomvideo/train/lecture (34).mp4 2 .....
chatGPT tells me the correct format is as follows, right?
dataset/
├── train/
│ ├── class1/
│ │ ├── video1.mp4
│ │ ├── video2.mp4
│ │ └── ...
│ ├── class2/
│ │ ├── video3.mp4
│ │ ├── video4.mp4
│ │ └── ...
│ └── ...
├── val/
│ ├── class1/
│ │ ├── video5.mp4
│ │ ├── video6.mp4
│ │ └── ...
│ ├── class2/
│ │ ├── video7.mp4
│ │ ├── video8.mp4
│ │ └── ...
│ └── ...
├── train.txt
├── val.txt
train.txt
dataset/train/class1/video1.mp4 0 dataset/train/class1/video2.mp4 0 ... dataset/train/class2/video3.mp4 1 dataset/train/class2/video4.mp4 1 ...
val.txt
dataset/val/class1/video5.mp4 0
dataset/val/class1/video6.mp4 0
...
dataset/val/class2/video7.mp4 1
dataset/val/class2/video8.mp4 1
...
@cir7 Thank you for your reply
There are whitespace characters in your video file name, and we default to use it as a split delimiter, which results in more values than expected. I suggest removing the whitespace in the video file name. Another way is to use , as the delimiter and modify the config file and your annotation accordingly.
I followed your advice and I just removed all the spaces, but after running, I always report the error of not finding the MP4 file,and the following is a new error.For example :FileNotFoundError: [Errno 2] No such file or directory: '/mmaction2/mmaction/datasets/classvideo/train/play_phone(2).mp4'
the following is my train.txt: train/drinking(1).mp4 1 train/drinking(2).mp4 1 train/drinking(3).mp4 1 .... train/lecture(1).mp4 2 train/lecture(2).mp4 2 train/lecture(3).mp4 2 ....
and the following is a new error.
Loads checkpoint by http backend from path: https://download.pytorch.org/models/resnet50-11ad3fa6.pth
08/30 17:03:18 - mmengine - INFO - These parameters in pretrained checkpoint are not loaded: {'fc.bias', 'fc.weight'}
08/30 17:03:19 - mmengine - WARNING - "FileClient" will be deprecated in future. Please use io functions in https://mmengine.readthedocs.io/en/latest/api/fileio.html#file-io
08/30 17:03:19 - mmengine - WARNING - "HardDiskBackend" is the alias of "LocalBackend" and the former will be deprecated in future.
08/30 17:03:19 - mmengine - INFO - Checkpoints will be saved to /mmaction2/work_dirs/tsn_imagenet-pretrained-r50_8xb32-1x1x3-100e_classroom.
Traceback (most recent call last):
File "tools/train.py", line 135, in
please make sure the file exist: /mmaction2/mmaction/datasets/classvideo/train/play_phone(2).mp4, maybe you forget to rename video files?
I renamed each file name by hand and I also rechecked the filename and removed the spaces.In addition to modifying your own config, is there anything else that needs to be modified, the official tutorial reference is not detailed, can you guide it?