PaddleX icon indicating copy to clipboard operation
PaddleX copied to clipboard

paddlex.cls训练报错

Open smallwhi opened this issue 4 years ago • 2 comments

问题类型:模型训练

PaddleX版本
paddlex==1.3.7

问题描述

====================
报错信息:

aistudio@jupyter-867535-2193921:~/work$ python train.py 
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/__init__.py:107: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import MutableMapping
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/rcsetup.py:20: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Iterable, Mapping
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/colors.py:53: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Sized
2021-07-22 19:38:51 [INFO]      Starting to read file list from dataset...
2021-07-22 19:38:57 [INFO]      4687 samples in file Animals_det/train_list.txt
creating index...
index created!
2021-07-22 19:38:57 [INFO]      Starting to read file list from dataset...
2021-07-22 19:38:59 [INFO]      1339 samples in file Animals_det/val_list.txt
creating index...
index created!
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/framework.py:689: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  elif dtype == np.bool:
2021-07-22 19:39:02 [INFO]      Downloading ResNet101_vd_ssld_pretrained.tar from https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_vd_ssld_pretrained.tar
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 162082/162082 [00:02<00:00, 63288.99KB/s]
2021-07-22 19:39:04 [INFO]      Decompressing ./output/pretrain/ResNet101_vd_ssld_pretrained.tar...
W0722 19:39:06.577644  2147 device_context.cc:404] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0722 19:39:06.583163  2147 device_context.cc:422] device: 0, cuDNN Version: 7.6.
2021-07-22 19:39:11 [INFO]      Load pretrain weights from ./output/pretrain/ResNet101_vd_ssld_pretrained.
2021-07-22 19:39:11 [WARNING]   [SKIP] Shape of pretrained weight ./output/pretrain/ResNet101_vd_ssld_pretrained/fc_0.w_0 doesn't match.(Pretrained: (2048, 1000), Actual: (2048, 3))
2021-07-22 19:39:11 [WARNING]   [SKIP] Shape of pretrained weight ./output/pretrain/ResNet101_vd_ssld_pretrained/fc_0.b_0 doesn't match.(Pretrained: (1000,), Actual: (3,))
2021-07-22 19:39:12 [INFO]      There are 530 varaibles in ./output/pretrain/ResNet101_vd_ssld_pretrained are loaded.
Process Process-1:
Traceback (most recent call last):
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/cv/datasets/dataset.py", line 166, in _read_into_queue
    six.reraise(*sys.exc_info())
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/six.py", line 703, in reraise
    raise value
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/cv/datasets/dataset.py", line 158, in _read_into_queue
    result = mapper(sample[0], sample[1], sample[2])
TypeError: __call__() takes from 2 to 3 positional arguments but 4 were given
Process Process-2:
Traceback (most recent call last):
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/cv/datasets/dataset.py", line 166, in _read_into_queue
    six.reraise(*sys.exc_info())
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/six.py", line 703, in reraise
    raise value
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/cv/datasets/dataset.py", line 158, in _read_into_queue
    result = mapper(sample[0], sample[1], sample[2])
TypeError: __call__() takes from 2 to 3 positional arguments but 4 were given
Process Process-3:
Traceback (most recent call last):
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/cv/datasets/dataset.py", line 166, in _read_into_queue
    six.reraise(*sys.exc_info())
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/six.py", line 703, in reraise
    raise value
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/cv/datasets/dataset.py", line 158, in _read_into_queue
    result = mapper(sample[0], sample[1], sample[2])
TypeError: __call__() takes from 2 to 3 positional arguments but 4 were given
Process Process-4:
Traceback (most recent call last):
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/cv/datasets/dataset.py", line 166, in _read_into_queue
    six.reraise(*sys.exc_info())
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/six.py", line 703, in reraise
    raise value
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/cv/datasets/dataset.py", line 158, in _read_into_queue
    result = mapper(sample[0], sample[1], sample[2])
TypeError: __call__() takes from 2 to 3 positional arguments but 4 were given
Process Process-5:
Traceback (most recent call last):
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/cv/datasets/dataset.py", line 166, in _read_into_queue
    six.reraise(*sys.exc_info())
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/six.py", line 703, in reraise
    raise value
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/cv/datasets/dataset.py", line 158, in _read_into_queue
    result = mapper(sample[0], sample[1], sample[2])
TypeError: __call__() takes from 2 to 3 positional arguments but 4 were given
Process Process-6:
Traceback (most recent call last):
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/cv/datasets/dataset.py", line 166, in _read_into_queue
    six.reraise(*sys.exc_info())
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/six.py", line 703, in reraise
    raise value
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/cv/datasets/dataset.py", line 158, in _read_into_queue
    result = mapper(sample[0], sample[1], sample[2])
TypeError: __call__() takes from 2 to 3 positional arguments but 4 were given
Process Process-7:
Traceback (most recent call last):
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/cv/datasets/dataset.py", line 166, in _read_into_queue
    six.reraise(*sys.exc_info())
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/six.py", line 703, in reraise
    raise value
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/cv/datasets/dataset.py", line 158, in _read_into_queue
    result = mapper(sample[0], sample[1], sample[2])
TypeError: __call__() takes from 2 to 3 positional arguments but 4 were given
Traceback (most recent call last):
  File "train.py", line 43, in <module>
Process Process-8:
Traceback (most recent call last):
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/cv/datasets/dataset.py", line 166, in _read_into_queue
    six.reraise(*sys.exc_info())
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/six.py", line 703, in reraise
    raise value
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/cv/datasets/dataset.py", line 158, in _read_into_queue
    result = mapper(sample[0], sample[1], sample[2])
TypeError: __call__() takes from 2 to 3 positional arguments but 4 were given
    /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/reader.py:1294: DeprecationWarning: The 'warn' function is deprecated, use 'warning' instead
  logging.warn('Your reader has raised an exception!')
use_vdl=True)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/cv/models/classifier.py", line 207, in train
WARNING:root:Your reader has raised an exception!
    early_stop_patience=early_stop_patience)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/cv/models/base.py", line 497, in train_loop
    for step, data in enumerate(self.train_data_loader()):
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/reader.py", line 1251, in __next__
Exception in thread Thread-14:
Traceback (most recent call last):
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/reader.py", line 1295, in __thread_main__
    six.reraise(*sys.exc_info())
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/six.py", line 703, in reraise
    raise value
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/reader.py", line 1275, in __thread_main__
    for tensors in self._tensor_reader():
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/reader.py", line 1354, in __tensor_reader_impl__
    for slots in paddle_reader():
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/data_feeder.py", line 553, in __reader_creator__
    for item in reader():
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/cv/datasets/dataset.py", line 187, in queue_reader
    raise ValueError("multiprocess reader raises an exception")
ValueError: multiprocess reader raises an exception

    return self._reader.read_next()
SystemError: (Fatal) Blocking queue is killed because the data reader raises an exception.
  [Hint: Expected killed_ != true, but received killed_:1 == true:1.] (at /paddle/paddle/fluid/operators/reader/blocking_queue.h:166)

训练代码:

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import paddlex as pdx

from paddlex.cls import transforms

train_transforms = transforms.Compose([
    transforms.RandomCrop(crop_size=224),
    transforms.RandomHorizontalFlip(),
    transforms.Normalize()
])
eval_transforms = transforms.Compose([
    transforms.ResizeByShort(short_size=256),
    transforms.CenterCrop(crop_size=224),
    transforms.Normalize()
])

train_dataset = pdx.datasets.VOCDetection(
    data_dir='Animals_det',
    file_list='Animals_det/train_list.txt',
    label_list='Animals_det/labels.txt',
    transforms=train_transforms,
    shuffle=True)
eval_dataset = pdx.datasets.VOCDetection(
    data_dir='Animals_det',
    file_list='Animals_det/val_list.txt',
    label_list='Animals_det/labels.txt',
    transforms=eval_transforms)

num_classes = len(train_dataset.labels)
model = pdx.cls.ResNet101_vd_ssld(num_classes=num_classes)
model.train(num_epochs = 10,
            save_interval_epochs = 2,
            train_dataset = train_dataset,
            train_batch_size = 16,
            eval_dataset = eval_dataset,
            learning_rate = 0.001575,
            warmup_steps = 32,
            warmup_start_lr = 0.0001,
            lr_decay_epochs=[2, 4, 8],
            lr_decay_gamma = 0.025,    
            save_dir='./output',
            use_vdl=True)

smallwhi avatar Jul 22 '21 11:07 smallwhi

静态图版本的paddlex,请更新至1.3.11后再重新试下

pip install paddlex==1.3.11 -U

FlyingQianMM avatar Jul 23 '21 02:07 FlyingQianMM

可以训练了,原因应该是我读取的数据集是VOC的但是需要的是ImageNet的数据集。

train_dataset = pdx.datasets.VOCDetection(
    data_dir='Animals_det',
    file_list='Animals_det/train_list.txt',
    label_list='Animals_det/labels.txt',
    transforms=train_transforms,
    shuffle=True)
eval_dataset = pdx.datasets.VOCDetection(
    data_dir='Animals_det',
    file_list='Animals_det/val_list.txt',
    label_list='Animals_det/labels.txt',
    transforms=eval_transforms)

smallwhi avatar Jul 23 '21 03:07 smallwhi