PaddleSlim 请问，压缩V4检测 server模型，需要使用什么数据集？

trafficstars

请问，压缩V4检测 server模型，需要使用什么数据集？

我使用自有的数据，验证精度不佳。我想请问我可以使用什么数据集进行验证，能尽可能保持模型的精度呢?

May 13 '24 09:05 TinyQi

一般使用训练数据数据就可以，另外请问下压缩方法使用的是什么？

May 13 '24 10:05 ceci3

一般使用训练数据数据就可以，另外请问下压缩方法使用的是什么？

我用我自己的训练集，大概只有1000张左右，压缩之后模型的精度很差。前几个iter的时候还至少有一点精度的，如下图到后面就变成这样了

May 13 '24 10:05 TinyQi

一般使用训练数据数据就可以，另外请问下压缩方法使用的是什么？

我使用的压缩方法我也不太清楚，就是按照你们的文档，简单修改了数据集和模型的地址，学习率我也按照batch_size和GPU卡数进行了调整。下面是我的配置文件，麻烦您看看是不是有什么问题： Global: model_type: det model_dir: /share/disk3/xcq/02.model_cache/pretrain_models/ch_PP-OCRv4_server_det_guding_shuru_1_output_bak/ #固定输出 model_filename: inference.pdmodel params_filename: inference.pdiparams algorithm: DB

Distillation: alpha: 1.0 loss: l2

QuantAware: use_pact: true activation_bits: 8 is_full_quantize: false onnx_format: True activation_quantize_type: moving_average_abs_max weight_quantize_type: channel_wise_abs_max not_quant_pattern:

skip_quant quantize_op_types:
conv2d
depthwise_conv2d weight_bits: 8

TrainConfig: epochs: 5 eval_iter: 200 learning_rate: type: CosineAnnealingDecay learning_rate: 0.00000625 optimizer_builder: optimizer: type: Adam weight_decay: 5.0e-05

PostProcess: name: DBPostProcess thresh: 0.3 box_thresh: 0.6 max_candidates: 1000 unclip_ratio: 1.5

Metric: name: DetMetric main_indicator: hmean

Train: dataset: name: SimpleDataSet data_dir: / label_file_list: - /share/disk3/xcq/01.ImageData/028.LocationCharacterRecognition/train/ready_to_train/2023-11-29det/train.txt ratio_list: [1.0] transforms: - DecodeImage: img_mode: BGR channel_first: false - DetLabelEncode: null - IaaAugment: augmenter_args: - type: Fliplr args: p: 0.5 - type: Affine args: rotate: - -10 - 10 - type: Resize args: size: - 0.5 - 3 - EastRandomCropData: size: - 960 - 960 max_tries: 50 keep_ratio: true - MakeBorderMap: shrink_ratio: 0.4 thresh_min: 0.3 thresh_max: 0.7 - MakeShrinkMap: shrink_ratio: 0.4 min_text_size: 8 - NormalizeImage: scale: 1./255. mean: - 0.485 - 0.456 - 0.406 std: - 0.229 - 0.224 - 0.225 order: hwc - ToCHWImage: null - KeepKeys: keep_keys: - image - threshold_map - threshold_mask - shrink_map - shrink_mask loader: shuffle: true drop_last: false batch_size_per_card: 1 num_workers: 0

Eval: dataset: name: SimpleDataSet data_dir: / label_file_list: - /share/disk3/xcq/01.ImageData/028.LocationCharacterRecognition/train/ready_to_train/2023-11-29det/test.txt transforms: - DecodeImage: img_mode: BGR channel_first: false - DetLabelEncode: null - DetResizeForTest: # limit_side_len: 960 # limit_type: 'max' image_shape: [960,960] keep_ratio: false - NormalizeImage: scale: 1./255. mean: - 0.485 - 0.456 - 0.406 std: - 0.229 - 0.224 - 0.225 order: hwc - ToCHWImage: null - KeepKeys: keep_keys: - image - shape - polys - ignore_tags loader: shuffle: false drop_last: false batch_size_per_card: 1 num_workers: 0

May 13 '24 10:05 TinyQi

另外有3个情况跟您反映一下，可能可以方便您排查问题。

我使用的模型是固定输入的，我在导出V4的开源静态模型时，我定死了输入尺寸，导出的模型是固定输入的（shape:[1,3,960,960]
我将模型的输出从原来的2个，改成了只有一个输出（只保留了sigmoid_11.tmp_0这个输出）。因为我一开始直接使用V4模型进行量化训练的时候，会报错。报错信息如下，因为我对比了示例中V3模型的模型，发现V3模型只有一个输出，为此我就尝试只保留一个输出，实验证明如果只保留一个输出的话，就不会崩溃，但是模型量化结果不尽人意。 Traceback (most recent call last): File "run.py", line 157, in main() File "run.py", line 150, in main ac.compress() File "/home/anaconda3/envs/paddle_2.4.1_gpu/lib/python3.7/site-packages/paddleslim/auto_compression/compressor.py", line 594, in compress train_config) File "/home/anaconda3/envs/paddle_2.4.1_gpu/lib/python3.7/site-packages/paddleslim/auto_compression/compressor.py", line 776, in single_strategy_compress train_program_info, test_program_info, strategy, train_config) File "/home/anaconda3/envs/paddle_2.4.1_gpu/lib/python3.7/site-packages/paddleslim/auto_compression/compressor.py", line 825, in _start_train test_program_info.fetch_targets) File "run.py", line 80, in eval_function fetch_list=test_fetch_list) ValueError: too many values to unpack (expected 1) img.shape:(960, 960, 3)
当前情况我使用的是V4的开源检测模型，在进行正常的微调训练时，我之前也尝试过使用我上面提到的自有的数据集进行微调，但是微调的结果都不如原始的开源模型，为此这个也是我怀疑数据集的问题的主要原因。

望有助您排查问题。

May 13 '24 10:05 TinyQi

量化训练脚本为：PaddleSlim/example/auto_compression/ocr/run.py

May 13 '24 10:05 TinyQi

请问下V4模型是什么模型，具体我怎么拿到模型？

May 14 '24 12:05 ceci3

我也遇到一样的问题

Jul 08 '24 12:07 huangguifeng

PaddleSlim PaddleSlim copied to clipboard

请问，压缩V4检测 server模型，需要使用什么数据集？

PaddleSlim
PaddleSlim copied to clipboard