mmpretrain icon indicating copy to clipboard operation
mmpretrain copied to clipboard

train problem

Open cj17SUI opened this issue 2 years ago • 3 comments

推荐使用英语模板 General question,以便你的问题帮助更多人。

首先确认以下内容

  • 我已经查询了相关的 issue,但没有找到需要的帮助。
  • 我已经阅读了相关文档,但仍不知道如何解决。

描述你遇到的问题

ValueError: num_samples should be a positive integer value, but got num_samples= 0

相关信息

  1. pip list | grep "mmcv\|mmcls\|^torch" 命令的输出 [填写这里]
  2. 如果你修改了,或者使用了新的配置文件,请在这里写明
[填写这里]
  1. 如果你是在训练过程中遇到的问题,请填写完整的训练日志和报错信息

训练日志:

   model = dict(
    type='ImageClassifier',
    backbone=dict(type='MobileNetV2', widen_factor=1.0),
    neck=dict(type='GlobalAveragePooling'),
    head=dict(
        type='LinearClsHead',
        num_classes=10,
        in_channels=1280,
        loss=dict(type='CrossEntropyLoss', loss_weight=1.0),
        topk=(1, 5)))
dataset_type = 'Lidar_Z'
train_pipeline = [
    dict(type='LoadImageFromFile', to_float32=True),
    dict(type='Resize', size=320),
    dict(type='ImageToTensor', keys=['img']),
    dict(type='ToTensor', keys=['gt_label']),
    dict(type='Collect', keys=['img', 'gt_label'])
]
test_pipeline = [
    dict(type='LoadImageFromFile', to_float32=True),
    dict(type='Resize', size=320),
    dict(type='ImageToTensor', keys=['img']),
    dict(type='Collect', keys=['img'])
]
data = dict(
    samples_per_gpu=16,
    workers_per_gpu=1,
    train=dict(
        type='Lidar_Z',
        data_prefix='E:\dataset',
        ann_file='FILELIST_TRAIN.txt',
        pipeline=[
            dict(type='LoadImageFromFile', to_float32=True),
            dict(type='Resize', size=320),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='ToTensor', keys=['gt_label']),
            dict(type='Collect', keys=['img', 'gt_label'])
        ]),
    val=dict(
        type='Lidar_Z',
        data_prefix='E:\dataset',
        ann_file='FILELIST_TRAIN.txt',
        pipeline=[
            dict(type='LoadImageFromFile', to_float32=True),
            dict(type='Resize', size=320),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ]),
    test=dict(
        type='Lidar_Z',
        data_prefix='E:\dataset',
        ann_file='FILELIST_TRAIN.txt',
        pipeline=[
            dict(type='LoadImageFromFile', to_float32=True),
            dict(type='Resize', size=320),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ]))
evaluation = dict(interval=1, metric='accuracy')
optimizer = dict(type='SGD', lr=0.1, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=None)
lr_config = dict(policy='step', step=[30, 60, 90])
runner = dict(type='EpochBasedRunner', max_epochs=100)
checkpoint_config = dict(interval=1)
log_config = dict(interval=100, hooks=[dict(type='TextLoggerHook')])
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
work_dir = './work_dirs\mobilenet_v2_b10x8_jcZhou'
gpu_ids = [0]

报错信息:

Traceback (most recent call last):
  File "tools/train.py", line 205, in <module>
    main()
  File "tools/train.py", line 193, in main
    train_model(
  File "e:\mmclassification\mmcls\apis\train.py", line 125, in train_model      
    data_loaders = [build_dataloader(ds, **train_loader_cfg) for ds in dataset] 
  File "e:\mmclassification\mmcls\apis\train.py", line 125, in <listcomp>       
    data_loaders = [build_dataloader(ds, **train_loader_cfg) for ds in dataset] 
  File "e:\mmclassification\mmcls\datasets\builder.py", line 156, in build_datal
oader
    data_loader = DataLoader(
  File "D:\Anaconda\envs\Pytorch\lib\site-packages\torch\utils\data\dataloader.p
y", line 270, in __init__
    sampler = RandomSampler(dataset, generator=generator)  # type: ignore[arg-ty
pe]
  File "D:\Anaconda\envs\Pytorch\lib\site-packages\torch\utils\data\sampler.py",
 line 102, in __init__
    raise ValueError("num_samples should be a positive integer "
ValueError: num_samples should be a positive integer value, but got num_samples=
0
  1. 如果你对 mmcls 文件夹下的代码做了其他相关的修改,请在这里写明 [填写这里]

cj17SUI avatar Aug 03 '22 01:08 cj17SUI

I think your dataset implementation has some problems that caused it to be empty. Did you implement it according to https://mmclassification.readthedocs.io/en/master/tutorials/new_dataset.html#create-a-new-dataset-class

mzr1996 avatar Aug 03 '22 02:08 mzr1996

I think your dataset implementation has some problems that caused it to be empty. Did you implement it according to https://mmclassification.readthedocs.io/en/master/tutorials/new_dataset.html#create-a-new-dataset-class

I corrected it according to that instruction, but now I have reported another error

OSError: [WinError 1455] 页面文件太小,无法完成操作。 Error loading "D:\Anaconda\envs\Py torch\lib\site-packages\torch\lib\cudnn_adv_infer64_8.dll" or one of its dependencies.

cj17SUI avatar Aug 03 '22 03:08 cj17SUI

I think it's a PyTorch installation problem

mzr1996 avatar Aug 03 '22 03:08 mzr1996

我认为您的数据集实现存在一些问题,导致其为空。您是否按照https://mmclassification.readthedocs.io/en/master/tutorials/new_dataset.html#create-a-new-dataset-class实现它

我按照该指示更正了它,但现在又报告了另一个错误

OSError: [WinError 1455] 页面文件太小,无法完成操作。 加载“D:\Anaconda\envs\Py torch\lib\site-packages\torch\lib\cudnn_adv_infer64_8.dll”或其依赖项之一时出错。

请问一下您现在修复这个bug了么, 是什么问题导致的呢?

a969366623 avatar Dec 21 '23 02:12 a969366623