PaddleX icon indicating copy to clipboard operation
PaddleX copied to clipboard

使用量化训练时,程序报错

Open CashBai opened this issue 3 years ago • 5 comments

模型类型为FasterRCNN,训练代码如下


Dataset_Dir= 'E:/PaddleX_workspace/datasets/D0001'
Model_Dir= 'E:/PaddleX_workspace/projects/P0001/T0001/output/best_model'
Quant_Dir= 'E:/PaddleX_workspace/projects/P0001/T0001/output/Quant'


import paddlex as pdx
from paddlex import transforms as T

train_transforms = T.Compose([ 
    T.ResizeByShort(short_size=800,max_size=1333), 
    T.RandomHorizontalFlip(),
    T.Normalize()])

eval_transforms = T.Compose([
    T.ResizeByShort(short_size=800,max_size=1333), 
    T.Normalize()])

train_dataset = pdx.datasets.VOCDetection(
    data_dir= Dataset_Dir,
    file_list= Dataset_Dir+'/train_list.txt',
    label_list= Dataset_Dir+'/labels.txt',
    transforms=train_transforms,
    shuffle=True)

eval_dataset = pdx.datasets.VOCDetection(
    data_dir= Dataset_Dir,
    file_list= Dataset_Dir+'/val_list.txt',
    label_list= Dataset_Dir+'/labels.txt',
    transforms=train_transforms,
    shuffle=False)

model=pdx.load_model(Model_Dir)

model.quant_aware_train(
    num_epochs=50,
    train_dataset=train_dataset,
    train_batch_size=2,
    eval_dataset=eval_dataset,
    learning_rate=0.001,
    warmup_steps=10,
    warmup_start_lr=0.0,
    save_interval_epochs=10,
    lr_decay_epochs=[30, 45],
    save_dir=Quant_Dir,
    use_vdl=True)

每当save_interval_epochs时,就会报如下错误

Python 3.8.10 (tags/v3.8.10:3d8993a, May  3 2021, 11:48:03) [MSC v.1928 64 bit (AMD64)]
Type "copyright", "credits" or "license" for more information.

IPython 7.30.1 -- An enhanced Interactive Python.

runfile('C:/Users/Administrator/Desktop/Quant.py', wdir='C:/Users/Administrator/Desktop')
2021-12-16 16:41:21 [INFO]	Starting to read file list from dataset...
2021-12-16 16:41:22 [INFO]	112 samples in file E:/PaddleX_workspace/datasets/D0001/train_list.txt, including 112 positive samples and 0 negative samples.
creating index...
index created!
2021-12-16 16:41:22 [INFO]	Starting to read file list from dataset...
2021-12-16 16:41:22 [INFO]	31 samples in file E:/PaddleX_workspace/datasets/D0001/val_list.txt, including 31 positive samples and 0 negative samples.
creating index...
index created!

W1216 16:41:22.079480 19600 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.5, Runtime API Version: 11.2
W1216 16:41:22.084383 19600 device_context.cc:465] device: 0, cuDNN Version: 8.1.
W1216 16:41:22.469692 19600 device_context.h:397] WARNING: device: 0. The installed Paddle is compiled with CUDNN 8.2, but CUDNN version in your machine is 8.1, which may cause serious incompatible bug. Please recompile or reinstall Paddle with compatible CUDNN version.
2021-12-16 16:41:23 [INFO]	Model[FasterRCNN] loaded.
2021-12-16 16:41:23 [INFO]	Preparing the model for quantization-aware training...
2021-12-16 16:41:23 [INFO]	Model is ready for quantization-aware training.
2021-12-16 16:41:31 [INFO]	[TRAIN] Epoch=1/50, Step=10/56, loss_rpn_cls=0.000846, loss_rpn_reg=0.006219, loss_bbox_cls=0.065688, loss_bbox_reg=0.113388, loss=0.186141, lr=0.000900, time_each_step=0.78s, eta=0:36:40
2021-12-16 16:41:38 [INFO]	[TRAIN] Epoch=1/50, Step=20/56, loss_rpn_cls=0.000625, loss_rpn_reg=0.011004, loss_bbox_cls=0.036733, loss_bbox_reg=0.053308, loss=0.101670, lr=0.001000, time_each_step=0.69s, eta=0:31:58
2021-12-16 16:41:45 [INFO]	[TRAIN] Epoch=1/50, Step=30/56, loss_rpn_cls=0.000737, loss_rpn_reg=0.008608, loss_bbox_cls=0.036264, loss_bbox_reg=0.053905, loss=0.099514, lr=0.001000, time_each_step=0.72s, eta=0:33:13
2021-12-16 16:41:52 [INFO]	[TRAIN] Epoch=1/50, Step=40/56, loss_rpn_cls=0.000373, loss_rpn_reg=0.002937, loss_bbox_cls=0.041724, loss_bbox_reg=0.062289, loss=0.107323, lr=0.001000, time_each_step=0.69s, eta=0:31:49
2021-12-16 16:41:59 [INFO]	[TRAIN] Epoch=1/50, Step=50/56, loss_rpn_cls=0.000121, loss_rpn_reg=0.002675, loss_bbox_cls=0.039063, loss_bbox_reg=0.062023, loss=0.103882, lr=0.001000, time_each_step=0.69s, eta=0:31:51
2021-12-16 16:42:03 [INFO]	[TRAIN] Epoch 1 finished, loss_rpn_cls=0.0015499819, loss_rpn_reg=0.0033932289, loss_bbox_cls=0.044911426, loss_bbox_reg=0.07647429, loss=0.12632892 .
2021-12-16 16:42:03 [WARNING]	Detector only supports single card evaluation with batch_size=1 during evaluation, so batch_size is forcibly set to 1.
2021-12-16 16:42:03 [INFO]	Start to evaluate(total_samples=31, total_steps=31)...
Traceback (most recent call last):

  File "C:\Users\ADMINI~1\AppData\Local\Temp/ipykernel_19104/2511553315.py", line 1, in <module>
    runfile('C:/Users/Administrator/Desktop/Quant.py', wdir='C:/Users/Administrator/Desktop')

  File "c:\program files\python38\lib\site-packages\debugpy\_vendored\pydevd\_pydev_bundle\pydev_umd.py", line 167, in runfile
    execfile(filename, namespace)

  File "c:\program files\python38\lib\site-packages\debugpy\_vendored\pydevd\_pydev_imps\_pydev_execfile.py", line 25, in execfile
    exec(compile(contents + "\n", file, 'exec'), glob, loc)

  File "C:/Users/Administrator/Desktop/Quant.py", line 35, in <module>
    model.quant_aware_train(

  File "c:\program files\python38\lib\site-packages\paddlex-2.1.0-py3.8.egg\paddlex\cv\models\detector.py", line 387, in quant_aware_train
    self.train(

  File "c:\program files\python38\lib\site-packages\paddlex-2.1.0-py3.8.egg\paddlex\cv\models\detector.py", line 1373, in train
    super(FasterRCNN, self).train(

  File "c:\program files\python38\lib\site-packages\paddlex-2.1.0-py3.8.egg\paddlex\cv\models\detector.py", line 323, in train
    self.train_loop(

  File "c:\program files\python38\lib\site-packages\paddlex-2.1.0-py3.8.egg\paddlex\cv\models\base.py", line 394, in train_loop
    eval_result = self.evaluate(

  File "c:\program files\python38\lib\site-packages\paddlex-2.1.0-py3.8.egg\paddlex\cv\models\detector.py", line 499, in evaluate
    outputs = self.run(self.net, data, 'eval')

  File "c:\program files\python38\lib\site-packages\paddlex-2.1.0-py3.8.egg\paddlex\cv\models\detector.py", line 105, in run
    net_out = net(inputs)

  File "c:\program files\python38\lib\site-packages\paddle\fluid\dygraph\layers.py", line 914, in __call__
    outputs = self.forward(*inputs, **kwargs)

  File "c:\program files\python38\lib\site-packages\paddlex-2.1.0-py3.8.egg\paddlex\ppdet\modeling\architectures\meta_arch.py", line 71, in forward
    outs.append(self.get_pred())

  File "c:\program files\python38\lib\site-packages\paddlex-2.1.0-py3.8.egg\paddlex\ppdet\modeling\architectures\faster_rcnn.py", line 104, in get_pred
    bbox_pred, bbox_num = self._forward()

  File "c:\program files\python38\lib\site-packages\paddlex-2.1.0-py3.8.egg\paddlex\ppdet\modeling\architectures\faster_rcnn.py", line 90, in _forward
    bbox_pred = self.bbox_post_process.get_pred(bbox, bbox_num,

  File "c:\program files\python38\lib\site-packages\paddle\fluid\dygraph\layers.py", line 1110, in __getattr__
    return object.__getattribute__(self, name)

AttributeError: 'MAOutputScaleLayer' object has no attribute 'get_pred'

CashBai avatar Dec 16 '21 08:12 CashBai

paddlepaddle-gpu和paddleslim的版本分别是多少呢?

will-jl944 avatar Dec 21 '21 06:12 will-jl944

paddlepaddle-gpu和paddleslim的版本分别是多少呢?

按照paddlex2.1.0的requirements.txt来安装的,paddlepaddle-gpu是2.2.0,slim是2.2.1

CashBai avatar Dec 22 '21 01:12 CashBai

使用paddlex2.3版本,在maskRCNN量化时也遇到这个问题了。请问这个问题解决了吗,或者绕开这个问题的方法?

SeventhBlue avatar Mar 30 '22 08:03 SeventhBlue

上述问题已修复~

yghstill avatar Apr 14 '22 02:04 yghstill

我们试了下,确实不报上述bug了。非常感谢!

SeventhBlue avatar Apr 14 '22 02:04 SeventhBlue