PaddleX
PaddleX copied to clipboard
目标识别检测模型FasterRCNN,图像预测是时偶发性报错
描述问题
采用FasterRCNN作为baseline进行目标检测模型训练,训练后部署到现场工控机上进行图像预测,大部分情况正常,每天都会遇到几次偶发性错误。
环境
1.windows10 企业版 /i7-9700 CPU /16G RAM/ 64位操作系统/2080Ti 2.python==3.9.7; paddlepaddle-gpu==2.2.1 paddlex==2.0.0
##模型训练代码 train_transforms = transforms.Compose([ transforms.RandomDistort(), transforms.RandomHorizontalFlip(), transforms.ResizeByShort(short_size=1024, max_size=2048), transforms.Normalize(), ])
eval_transforms = transforms.Compose([
transforms.ResizeByShort(short_size=1024, max_size=2048),
transforms.Normalize(),
])
root_path = 'Full'
train_dataset = pdx.datasets.VOCDetection(
data_dir=root_path,
file_list=os.path.join(root_path, 'train_list.txt'),
label_list=os.path.join(root_path, 'labels.txt'),
transforms=train_transforms,
shuffle=True)
eval_dataset = pdx.datasets.VOCDetection(
data_dir=root_path,
file_list=os.path.join(root_path, 'val_list.txt'),
label_list=os.path.join(root_path, 'labels.txt'),
transforms=eval_transforms)
train_dataset.add_negative_samples(image_dir='Background')
num_classes = len(train_dataset.labels) + 1
model = pdx.det.FasterRCNN(
num_classes=num_classes,
backbone='ResNet50_vd_ssld',
with_dcn=True,
fpn_num_channels=64,
with_fpn=True,
test_pre_nms_top_n=500,
test_post_nms_top_n=300)
model.train(
num_epochs=20,
train_dataset=train_dataset,
train_batch_size=4,
eval_dataset=eval_dataset,
save_interval_epochs=1,
metric='VOC',
learning_rate=0.01,
lr_decay_epochs=[12, 16],
warmup_steps=500,
save_dir='Output/Full/faster_rcnn_r50_vd_dcn',
use_vdl=True,
early_stop=True)
#导出模型 paddlex --export_inference --model_dir=Output/faster_rcnn_r50_vd_dcn/best_model --save_dir=Output/faster_rcnn_r50_vd_dcn/
模型预测代码
model = pdx.load_model(path_to_model) result = model.predict(image)
model.yml
Model: FasterRCNN Transforms:
- ResizeByShort: interp: LINEAR max_size: 2048 short_size: 1024
- Normalize:
is_scale: true
mean:
- 0.485
- 0.456
- 0.406 std:
- 0.229
- 0.224
- 0.225
- Padding:
im_padding_value:
- 0.0
- 0.0
- 0.0 label_padding_value: 255 offsets: null pad_mode: 0 size_divisor: 32 target_size: null _Attributes: eval_metrics: bbox_map: 82.8894546295336 fixed_input_shape:
- -1
- 3
- -1
- -1 labels:
- guahua
- liehen
- posun
- queliao
- waixie
- zangwu model_type: detector num_classes: 7 _init_params: anchor_sizes:
-
- 32
-
- 64
-
- 128
-
- 256
-
- 512 aspect_ratios:
- 0.5
- 1.0
- 2.0 backbone: ResNet50_vd_ssld fpn_num_channels: 64 keep_top_k: 100 nms_threshold: 0.5 num_classes: 7 rpn_batch_size_per_im: 256 rpn_fg_fraction: 0.5 score_threshold: 0.05 test_post_nms_top_n: 300 test_pre_nms_top_n: 500 with_dcn: true with_fpn: true completed_epochs: 0 status: Infer version: 2.0.0
错误信息
第一种: ERROR The dims of Inputs(Condition) and Inputs(X) should be same. But received Condition's shape is [3, 1], X's shape is [1, 1] [Hint: Expected cond_dims == x_dims, but received cond_dims:3, 1 != x_dims:1, 1.] (at C:/home/workspace/Paddle_release2/paddle/fluid/operators/where_op.cc:38) [operator < where > error]
第二种: ERROR The dims of Inputs(Condition) and Inputs(X) should be same. But received Condition's shape is [2, 1], X's shape is [1, 1] [Hint: Expected cond_dims == x_dims, but received cond_dims:2, 1 != x_dims:1, 1.] (at C:/home/workspace/Paddle_release2/paddle/fluid/operators/where_op.cc:38) [operator < where > error]
第三种: ERROR Dims of all Inputs(X) must be the same, but received input 1 dim is:1 not equal to input 0 dim:2. [Hint: Expected input_dims[i] == input_dims[0], but received input_dims[i]:1 != input_dims[0]:2.] (at C:/home/workspace/Paddle_release2/paddle/fluid/operators/stack_op.cc:46) [operator < stack > error]
第四种: ERROR Dims of all Inputs(X) must be the same, but received input 1 dim is:1 not equal to input 0 dim:4. [Hint: Expected input_dims[i] == input_dims[0], but received input_dims[i]:1 != input_dims[0]:4.] (at C:/home/workspace/Paddle_release2/paddle/fluid/operators/stack_op.cc:46) [operator < stack > error]
第五种: ERROR Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [5] and the shape of Y = [3]. Received [5] in X is not equal to [3] in Y at i:0. [Hint: Expected x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1 == true, but received x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1:0 != true:1.] (at C:\home\workspace\Paddle_release2\paddle/fluid/operators/elementwise/elementwise_op_function.h:169) [operator < elementwise_min > error]
错误来看每次输入上有问题,确认出错误的时刻与其他时刻给模型的输入是一样的吗?
错误来看每次输入上有问题,确认出错误的时刻与其他时刻给模型的输入是一样的吗?
采用的是海康线阵相机进行在线触发拍照,每次输入模型的都是204810243 的图像; 报错时刻对应的图像有实时保存,跟正常情况下的图像是一致的。 模型每天运行大概一万多次,查看运行日志报错信息大概有七、八次。
看代码每次给模型的输入是image,这个就是你说的2048*1024 3通道的图像是吧? 实时保存是说每次预测前都会将输入保存成本地文件?然后出错时用保存的图像再加载预测是OK的是吧? 方便可以发一下模型预测前image的前处理代码以及出错时保存的图像。
看代码每次给模型的输入是image,这个就是你说的2048*1024 3通道的图像是吧? 实时保存是说每次预测前都会将输入保存成本地文件?然后出错时用保存的图像再加载预测是OK的是吧? 方便可以发一下模型预测前image的前处理代码以及出错时保存的图像。
相机触发后会先将图像存储在本地,然后再去读取图像进行加载预测。 出错时保存的图像再加载预测是OK的,图像本身也没有问题。 模型预测前image的处理代码:
img= cv2.imread(imagepath)
rows, cols, channels = img.shape
black = np.zeros([rows, cols, channels], img.dtype)
original = cv2.addWeighted(img, c, black, 1-c, b)
try:
result_full = predict.predict_img(self.model_full, original)
check_pic = visualize.visualize_detection(original.copy(), result_full, threshold=config.threshold,
save_dir=config.today.get_check_full_path())
except Exception as e:
log.error(e)
self.savePic(num, '00', 'kadun', original)
出错时存储的图片:链接:https://pan.baidu.com/s/1C7JkJ1R_TPTnj3fXrUc7tg 提取码:urru