PaddleOCR
PaddleOCR copied to clipboard
kie训练自定义数据集,配置文件指定预训练模型不生效
Search before asking
-
[X] I have searched the PaddleOCR Docs and found no similar bug report.
-
[X] I have searched the PaddleOCR Issues and found no similar bug report.
-
[X] I have searched the PaddleOCR Discussions and found no similar bug report.
Bug
如题:我是在百度studio进行训练,参考官方文档进行操作https://github.com/PaddlePaddle/PaddleOCR/blob/main/doc/doc_ch/kie.md, 我首先训练的是ser模型,配置内容如下:
Global:
use_gpu: True
epoch_num: &epoch_num 20
log_smooth_window: 10
print_batch_step: 10
save_model_dir: ./output/ccic/ser_vi_layoutxlm_xfund_zh
save_epoch_step: 2000
# evaluation is run every 10 iterations after the 0th iteration
eval_batch_step: [ 0, 19 ]
cal_metric_during_train: False
**pretrained_model: ./pretrained_model/ser_vi_layoutxlm_xfund_pretrained**
save_inference_dir:
use_visualdl: False
seed: 2022
infer_img: ppstructure/docs/kie/input/zh_val_42.jpg
d2s_train_image_shape: [3, 224, 224]
# if you want to predict using the groundtruth ocr info,
# you can use the following config
# infer_img: train_data/XFUND/zh_val/val.json
# infer_mode: False
save_res_path: ./output/ccic/ser/xfund_zh/res
kie_rec_model_dir:
kie_det_model_dir:
amp_custom_white_list: ['scale', 'concat', 'elementwise_add']
Architecture:
model_type: kie
algorithm: &algorithm "LayoutXLM"
Transform:
Backbone:
name: LayoutXLMForSer
pretrained: True
checkpoints:
# one of base or vi
mode: vi
num_classes: &num_classes 7
Loss:
name: VQASerTokenLayoutLMLoss
num_classes: *num_classes
key: "backbone_out"
Optimizer:
name: AdamW
beta1: 0.9
beta2: 0.999
lr:
name: Linear
learning_rate: 0.00001
epochs: *epoch_num
warmup_epoch: 2
regularizer:
name: L2
factor: 0.00000
PostProcess:
name: VQASerTokenLayoutLMPostProcess
class_path: &class_path train_data/XCCIC_8020/class_list_xfun.txt
Metric:
name: VQASerTokenMetric
main_indicator: hmean
Train:
dataset:
name: SimpleDataSet
data_dir: train_data/XCCIC_8020/zh_train/image
label_file_list:
- train_data/XCCIC_8020/zh_train/train.json
ratio_list: [ 1.0 ]
transforms:
- DecodeImage: # load image
img_mode: RGB
channel_first: False
- VQATokenLabelEncode: # Class handling label
contains_re: False
algorithm: *algorithm
class_path: *class_path
use_textline_bbox_info: &use_textline_bbox_info True
# one of [None, "tb-yx"]
order_method: &order_method "tb-yx"
- VQATokenPad:
max_seq_len: &max_seq_len 512
return_attention_mask: True
- VQASerTokenChunk:
max_seq_len: *max_seq_len
- Resize:
size: [224,224]
- NormalizeImage:
scale: 1
mean: [ 123.675, 116.28, 103.53 ]
std: [ 58.395, 57.12, 57.375 ]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels'] # dataloader will return list in this order
loader:
shuffle: True
drop_last: False
batch_size_per_card: 8
num_workers: 4
Eval:
dataset:
name: SimpleDataSet
data_dir: train_data/XCCIC_8020/zh_val/image
label_file_list:
- train_data/XCCIC_8020/zh_val/val.json
transforms:
- DecodeImage: # load image
img_mode: RGB
channel_first: False
- VQATokenLabelEncode: # Class handling label
contains_re: False
algorithm: *algorithm
class_path: *class_path
use_textline_bbox_info: *use_textline_bbox_info
order_method: *order_method
- VQATokenPad:
max_seq_len: *max_seq_len
return_attention_mask: True
- VQASerTokenChunk:
max_seq_len: *max_seq_len
- Resize:
size: [224,224]
- NormalizeImage:
scale: 1
mean: [ 123.675, 116.28, 103.53 ]
std: [ 58.395, 57.12, 57.375 ]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels'] # dataloader will return list in this order
loader:
shuffle: False
drop_last: False
batch_size_per_card: 8
num_workers: 4
pretrained_model: ./pretrained_model/ser_vi_layoutxlm_xfund_pretrained 这一行配置是我新加的。当我执行训练命令:
%cd /home/aistudio/PaddleOCR
!python3 tools/train.py -c configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh.yml
可以看到日志还是会默认下载模型并没有使用我配置的预训练模型,
我的需求是:我希望使用官网文档提供的预训练模型进行自定义数据的训练。
/home/aistudio/PaddleOCR
[2024/08/09 10:41:58] ppocr INFO: Architecture :
[2024/08/09 10:41:58] ppocr INFO: Backbone :
[2024/08/09 10:41:58] ppocr INFO: checkpoints : None
[2024/08/09 10:41:58] ppocr INFO: mode : vi
[2024/08/09 10:41:58] ppocr INFO: name : LayoutXLMForSer
[2024/08/09 10:41:58] ppocr INFO: num_classes : 7
[2024/08/09 10:41:58] ppocr INFO: pretrained : True
[2024/08/09 10:41:58] ppocr INFO: Transform : None
[2024/08/09 10:41:58] ppocr INFO: algorithm : LayoutXLM
[2024/08/09 10:41:58] ppocr INFO: model_type : kie
[2024/08/09 10:41:58] ppocr INFO: Eval :
[2024/08/09 10:41:58] ppocr INFO: dataset :
[2024/08/09 10:41:58] ppocr INFO: data_dir : train_data/XCCIC_8020/zh_val/image
[2024/08/09 10:41:58] ppocr INFO: label_file_list : ['train_data/XCCIC_8020/zh_val/val.json']
[2024/08/09 10:41:58] ppocr INFO: name : SimpleDataSet
[2024/08/09 10:41:58] ppocr INFO: transforms :
[2024/08/09 10:41:58] ppocr INFO: DecodeImage :
[2024/08/09 10:41:58] ppocr INFO: channel_first : False
[2024/08/09 10:41:58] ppocr INFO: img_mode : RGB
[2024/08/09 10:41:58] ppocr INFO: VQATokenLabelEncode :
[2024/08/09 10:41:58] ppocr INFO: algorithm : LayoutXLM
[2024/08/09 10:41:58] ppocr INFO: class_path : train_data/XCCIC_8020/class_list_xfun.txt
[2024/08/09 10:41:58] ppocr INFO: contains_re : False
[2024/08/09 10:41:58] ppocr INFO: order_method : tb-yx
[2024/08/09 10:41:58] ppocr INFO: use_textline_bbox_info : True
[2024/08/09 10:41:58] ppocr INFO: VQATokenPad :
[2024/08/09 10:41:58] ppocr INFO: max_seq_len : 512
[2024/08/09 10:41:58] ppocr INFO: return_attention_mask : True
[2024/08/09 10:41:58] ppocr INFO: VQASerTokenChunk :
[2024/08/09 10:41:58] ppocr INFO: max_seq_len : 512
[2024/08/09 10:41:58] ppocr INFO: Resize :
[2024/08/09 10:41:58] ppocr INFO: size : [224, 224]
[2024/08/09 10:41:58] ppocr INFO: NormalizeImage :
[2024/08/09 10:41:58] ppocr INFO: mean : [123.675, 116.28, 103.53]
[2024/08/09 10:41:58] ppocr INFO: order : hwc
[2024/08/09 10:41:58] ppocr INFO: scale : 1
[2024/08/09 10:41:58] ppocr INFO: std : [58.395, 57.12, 57.375]
[2024/08/09 10:41:58] ppocr INFO: ToCHWImage : None
[2024/08/09 10:41:58] ppocr INFO: KeepKeys :
[2024/08/09 10:41:58] ppocr INFO: keep_keys : ['input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels']
[2024/08/09 10:41:58] ppocr INFO: loader :
[2024/08/09 10:41:58] ppocr INFO: batch_size_per_card : 8
[2024/08/09 10:41:58] ppocr INFO: drop_last : False
[2024/08/09 10:41:58] ppocr INFO: num_workers : 4
[2024/08/09 10:41:58] ppocr INFO: shuffle : False
[2024/08/09 10:41:58] ppocr INFO: Global :
[2024/08/09 10:41:58] ppocr INFO: amp_custom_white_list : ['scale', 'concat', 'elementwise_add']
[2024/08/09 10:41:58] ppocr INFO: cal_metric_during_train : False
[2024/08/09 10:41:58] ppocr INFO: d2s_train_image_shape : [3, 224, 224]
[2024/08/09 10:41:58] ppocr INFO: distributed : False
[2024/08/09 10:41:58] ppocr INFO: epoch_num : 20
[2024/08/09 10:41:58] ppocr INFO: eval_batch_step : [0, 19]
[2024/08/09 10:41:58] ppocr INFO: infer_img : ppstructure/docs/kie/input/zh_val_42.jpg
[2024/08/09 10:41:58] ppocr INFO: kie_det_model_dir : None
[2024/08/09 10:41:58] ppocr INFO: kie_rec_model_dir : None
[2024/08/09 10:41:58] ppocr INFO: log_smooth_window : 10
[2024/08/09 10:41:58] ppocr INFO: pretrained_model : ./pretrained_model/ser_vi_layoutxlm_xfund_pretrained
[2024/08/09 10:41:58] ppocr INFO: print_batch_step : 10
[2024/08/09 10:41:58] ppocr INFO: save_epoch_step : 2000
[2024/08/09 10:41:58] ppocr INFO: save_inference_dir : None
[2024/08/09 10:41:58] ppocr INFO: save_model_dir : ./output/ccic/ser_vi_layoutxlm_xfund_zh
[2024/08/09 10:41:58] ppocr INFO: save_res_path : ./output/ccic/ser/xfund_zh/res
[2024/08/09 10:41:58] ppocr INFO: seed : 2022
[2024/08/09 10:41:58] ppocr INFO: use_gpu : True
[2024/08/09 10:41:58] ppocr INFO: use_visualdl : False
[2024/08/09 10:41:58] ppocr INFO: Loss :
[2024/08/09 10:41:58] ppocr INFO: key : backbone_out
[2024/08/09 10:41:58] ppocr INFO: name : VQASerTokenLayoutLMLoss
[2024/08/09 10:41:58] ppocr INFO: num_classes : 7
[2024/08/09 10:41:58] ppocr INFO: Metric :
[2024/08/09 10:41:58] ppocr INFO: main_indicator : hmean
[2024/08/09 10:41:58] ppocr INFO: name : VQASerTokenMetric
[2024/08/09 10:41:58] ppocr INFO: Optimizer :
[2024/08/09 10:41:58] ppocr INFO: beta1 : 0.9
[2024/08/09 10:41:58] ppocr INFO: beta2 : 0.999
[2024/08/09 10:41:58] ppocr INFO: lr :
[2024/08/09 10:41:58] ppocr INFO: epochs : 20
[2024/08/09 10:41:58] ppocr INFO: learning_rate : 1e-05
[2024/08/09 10:41:58] ppocr INFO: name : Linear
[2024/08/09 10:41:58] ppocr INFO: warmup_epoch : 2
[2024/08/09 10:41:58] ppocr INFO: name : AdamW
[2024/08/09 10:41:58] ppocr INFO: regularizer :
[2024/08/09 10:41:58] ppocr INFO: factor : 0.0
[2024/08/09 10:41:58] ppocr INFO: name : L2
[2024/08/09 10:41:58] ppocr INFO: PostProcess :
[2024/08/09 10:41:58] ppocr INFO: class_path : train_data/XCCIC_8020/class_list_xfun.txt
[2024/08/09 10:41:58] ppocr INFO: name : VQASerTokenLayoutLMPostProcess
[2024/08/09 10:41:58] ppocr INFO: Train :
[2024/08/09 10:41:58] ppocr INFO: dataset :
[2024/08/09 10:41:58] ppocr INFO: data_dir : train_data/XCCIC_8020/zh_train/image
[2024/08/09 10:41:58] ppocr INFO: label_file_list : ['train_data/XCCIC_8020/zh_train/train.json']
[2024/08/09 10:41:58] ppocr INFO: name : SimpleDataSet
[2024/08/09 10:41:58] ppocr INFO: ratio_list : [1.0]
[2024/08/09 10:41:58] ppocr INFO: transforms :
[2024/08/09 10:41:58] ppocr INFO: DecodeImage :
[2024/08/09 10:41:58] ppocr INFO: channel_first : False
[2024/08/09 10:41:58] ppocr INFO: img_mode : RGB
[2024/08/09 10:41:58] ppocr INFO: VQATokenLabelEncode :
[2024/08/09 10:41:58] ppocr INFO: algorithm : LayoutXLM
[2024/08/09 10:41:58] ppocr INFO: class_path : train_data/XCCIC_8020/class_list_xfun.txt
[2024/08/09 10:41:58] ppocr INFO: contains_re : False
[2024/08/09 10:41:58] ppocr INFO: order_method : tb-yx
[2024/08/09 10:41:58] ppocr INFO: use_textline_bbox_info : True
[2024/08/09 10:41:58] ppocr INFO: VQATokenPad :
[2024/08/09 10:41:58] ppocr INFO: max_seq_len : 512
[2024/08/09 10:41:58] ppocr INFO: return_attention_mask : True
[2024/08/09 10:41:58] ppocr INFO: VQASerTokenChunk :
[2024/08/09 10:41:58] ppocr INFO: max_seq_len : 512
[2024/08/09 10:41:58] ppocr INFO: Resize :
[2024/08/09 10:41:58] ppocr INFO: size : [224, 224]
[2024/08/09 10:41:58] ppocr INFO: NormalizeImage :
[2024/08/09 10:41:58] ppocr INFO: mean : [123.675, 116.28, 103.53]
[2024/08/09 10:41:58] ppocr INFO: order : hwc
[2024/08/09 10:41:58] ppocr INFO: scale : 1
[2024/08/09 10:41:58] ppocr INFO: std : [58.395, 57.12, 57.375]
[2024/08/09 10:41:58] ppocr INFO: ToCHWImage : None
[2024/08/09 10:41:58] ppocr INFO: KeepKeys :
[2024/08/09 10:41:58] ppocr INFO: keep_keys : ['input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels']
[2024/08/09 10:41:58] ppocr INFO: loader :
[2024/08/09 10:41:58] ppocr INFO: batch_size_per_card : 8
[2024/08/09 10:41:58] ppocr INFO: drop_last : False
[2024/08/09 10:41:58] ppocr INFO: num_workers : 4
[2024/08/09 10:41:58] ppocr INFO: shuffle : True
[2024/08/09 10:41:58] ppocr INFO: profiler_options : None
[2024/08/09 10:41:58] ppocr INFO: train with paddle 2.5.2 and device Place(gpu:0)
[2024/08/09 10:41:58] ppocr INFO: Initialize indexs of datasets:['train_data/XCCIC_8020/zh_train/train.json']
list index out of range
[2024-08-09 10:41:59,583] [ INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/transformers/layoutxlm_base/sentencepiece.bpe.model and saved to /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased
[2024-08-09 10:41:59,640] [ INFO] - Downloading sentencepiece.bpe.model from https://bj.bcebos.com/paddlenlp/models/transformers/layoutxlm_base/sentencepiece.bpe.model
100%|██████████████████████████████████████| 4.83M/4.83M [00:00<00:00, 5.25MB/s]
[2024-08-09 10:42:01,488] [ INFO] - tokenizer config file saved in /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased/tokenizer_config.json
[2024-08-09 10:42:01,488] [ INFO] - Special tokens file saved in /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased/special_tokens_map.json
[2024/08/09 10:42:01] ppocr INFO: Initialize indexs of datasets:['train_data/XCCIC_8020/zh_val/val.json']
[2024-08-09 10:42:01,490] [ INFO] - Already cached /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased/sentencepiece.bpe.model
[2024-08-09 10:42:02,249] [ INFO] - tokenizer config file saved in /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased/tokenizer_config.json
[2024-08-09 10:42:02,249] [ INFO] - Special tokens file saved in /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased/special_tokens_map.json
[2024-08-09 10:42:02,252] [ INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/transformers/vi-layoutxlm-base-uncased/model_state.pdparams and saved to /home/aistudio/.paddlenlp/models/vi-layoutxlm-base-uncased
[2024-08-09 10:42:02,252] [ INFO] - Downloading model_state.pdparams from https://bj.bcebos.com/paddlenlp/models/transformers/vi-layoutxlm-base-uncased/model_state.pdparams
100%|██████████████████████████████████████| 1.04G/1.04G [00:13<00:00, 80.3MB/s]
W0809 10:42:16.289948 80856 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.0, Runtime API Version: 11.8
W0809 10:42:16.291229 80856 gpu_resources.cc:149] device: 0, cuDNN Version: 8.9.
[2024-08-09 10:42:19,987] [ INFO] - Weights of LayoutXLMForTokenClassification not initialized from pretrained model: ['classifier.weight', 'classifier.bias']
[2024/08/09 10:42:20] ppocr INFO: train dataloader has 18 iters
[2024/08/09 10:42:20] ppocr INFO: valid dataloader has 5 iters
Environment
百度studio aiofiles==23.2.1 aiohttp==3.9.5 aiosignal==1.3.1 aistudio-sdk @ file:///home/aistudio/aistudio_sdk-0.2.4-py3-none-any.whl#sha256=d93411cc8764e465860cbf2f97f787dddd1548595d4776c97ddf0ea787dedd81 albucore==0.0.13 albumentations==1.4.10 altair==4.2.2 annotated-types==0.6.0 anyio==4.3.0 astor==0.8.1 asttokens==2.4.1 async-timeout==4.0.3 attrdict3==2.0.2 attrs==23.2.0 Babel==2.14.0 bce-python-sdk==0.9.6 beautifulsoup4==4.12.3 blinker==1.7.0 cachetools==5.3.3 certifi==2024.2.2 charset-normalizer==3.3.2 click==8.1.7 colorama==0.4.6 coloredlogs==15.0.1 colorlog==6.8.2 comm==0.2.2 contourpy==1.2.1 cycler==0.12.1 Cython==3.0.11 datasets==2.19.0 debugpy==1.8.1 decorator==5.1.1 dill==0.3.4 easydict==1.13 entrypoints==0.4 exceptiongroup==1.2.1 executing==2.0.1 fastapi==0.110.2 ffmpy==0.3.2 filelock==3.13.4 fire==0.6.0 Flask==3.0.3 Flask-Babel==2.0.0 flatbuffers==24.3.25 fonttools==4.51.0 frozenlist==1.4.1 fsspec==2024.3.1 future==1.0.0 gitdb==4.0.11 GitPython==3.1.43 gradio==3.40.0 gradio_client==0.15.1 gunicorn==22.0.0 h11==0.14.0 httpcore==1.0.5 httpx==0.27.0 huggingface-hub==0.22.2 humanfriendly==10.0 idna==3.7 imageio==2.34.2 imgaug==0.4.0 importlib_metadata==7.1.0 importlib_resources==6.4.0 ipykernel==6.29.4 ipython==8.23.0 itsdangerous==2.2.0 jedi==0.19.1 jieba==0.42.1 Jinja2==3.1.3 joblib==1.4.0 jsonschema==4.21.1 jsonschema-specifications==2023.12.1 jupyter_client==8.6.1 jupyter_core==5.7.2 kiwisolver==1.4.5 lazy_loader==0.4 linkify-it-py==2.0.3 lmdb==1.5.1 lxml==5.2.2 markdown-it-py==2.2.0 MarkupSafe==2.1.5 matplotlib==3.8.4 matplotlib-inline==0.1.7 mdit-py-plugins==0.3.3 mdurl==0.1.1 mpmath==1.3.0 multidict==6.0.5 multiprocess==0.70.12.2 nest-asyncio==1.6.0 networkx==3.3 numpy==1.26.4 onnx==1.16.0 onnxruntime==1.17.3 opencv-contrib-python==4.10.0.84 opencv-python==4.9.0.80 opencv-python-headless==4.10.0.84 opt-einsum==3.3.0 orjson==3.10.1 packaging==24.0 paddle2onnx==1.2.1 paddlefsl==1.1.0 paddlehub==2.4.0 paddlenlp==2.5.2 paddleocr==2.8.1 paddlepaddle-gpu @ file:///tmp/paddlepaddle_gpu-2.5.2-cp310-cp310-linux_x86_64.whl#sha256=2b4a84c853c7c88ddf4984c667bfcb824cc8a28a674448099452f50c686cc1bb pandas==2.2.2 parso==0.8.4 pexpect==4.9.0 pickleshare==0.7.5 pillow==10.3.0 platformdirs==4.2.0 prettytable==3.10.0 prompt-toolkit==3.0.43 protobuf==3.20.3 psutil==5.9.8 ptyprocess==0.7.0 pure-eval==0.2.2 pyarrow==16.0.0 pyarrow-hotfix==0.6 pybind11==2.12.0 pyclipper==1.3.0.post5 pycryptodome==3.20.0 pydantic==2.7.0 pydantic_core==2.18.1 pydeck==0.9.1 pydub==0.25.1 Pygments==2.17.2 Pympler==1.0.1 pypandoc==1.13 pyparsing==3.1.2 python-dateutil==2.9.0.post0 python-docx==1.1.2 python-multipart==0.0.9 pytz==2024.1 PyYAML==6.0.1 pyzmq==26.0.2 rapidfuzz==3.9.6 rarfile==4.2 referencing==0.34.0 requests==2.31.0 rich==13.7.1 rpds-py==0.18.0 ruff==0.4.1 safetensors==0.4.3 scikit-image==0.24.0 scikit-learn==1.4.2 scipy==1.13.0 semantic-version==2.10.0 semver==3.0.2 sentencepiece==0.2.0 seqeval==1.2.2 shapely==2.0.5 shellingham==1.5.4 six==1.16.0 smmap==5.0.1 sniffio==1.3.1 soupsieve==2.5 stack-data==0.6.3 starlette==0.37.2 streamlit==1.13.0 streamlit-image-comparison==0.0.4 sympy==1.12 termcolor==2.4.0 threadpoolctl==3.4.0 tifffile==2024.7.24 toml==0.10.2 tomli==2.0.1 tomlkit==0.12.0 tool-helpers==0.1.1 toolz==0.12.1 tornado==6.4 tqdm==4.66.2 traitlets==5.14.3 typer==0.12.3 typing_extensions==4.11.0 tzdata==2024.1 tzlocal==5.2 uc-micro-py==1.0.3 urllib3==2.2.1 uvicorn==0.29.0 validators==0.28.3 visualdl==2.4.2 watchdog==4.0.1 wcwidth==0.2.13 websockets==11.0.3 Werkzeug==3.0.2 xxhash==3.4.1 yacs==0.1.8 yarl==1.9.4 zipp==3.19.2
Minimal Reproducible Example
/home/aistudio/PaddleOCR
[2024/08/09 10:41:58] ppocr INFO: Architecture :
[2024/08/09 10:41:58] ppocr INFO: Backbone :
[2024/08/09 10:41:58] ppocr INFO: checkpoints : None
[2024/08/09 10:41:58] ppocr INFO: mode : vi
[2024/08/09 10:41:58] ppocr INFO: name : LayoutXLMForSer
[2024/08/09 10:41:58] ppocr INFO: num_classes : 7
[2024/08/09 10:41:58] ppocr INFO: pretrained : True
[2024/08/09 10:41:58] ppocr INFO: Transform : None
[2024/08/09 10:41:58] ppocr INFO: algorithm : LayoutXLM
[2024/08/09 10:41:58] ppocr INFO: model_type : kie
[2024/08/09 10:41:58] ppocr INFO: Eval :
[2024/08/09 10:41:58] ppocr INFO: dataset :
[2024/08/09 10:41:58] ppocr INFO: data_dir : train_data/XCCIC_8020/zh_val/image
[2024/08/09 10:41:58] ppocr INFO: label_file_list : ['train_data/XCCIC_8020/zh_val/val.json']
[2024/08/09 10:41:58] ppocr INFO: name : SimpleDataSet
[2024/08/09 10:41:58] ppocr INFO: transforms :
[2024/08/09 10:41:58] ppocr INFO: DecodeImage :
[2024/08/09 10:41:58] ppocr INFO: channel_first : False
[2024/08/09 10:41:58] ppocr INFO: img_mode : RGB
[2024/08/09 10:41:58] ppocr INFO: VQATokenLabelEncode :
[2024/08/09 10:41:58] ppocr INFO: algorithm : LayoutXLM
[2024/08/09 10:41:58] ppocr INFO: class_path : train_data/XCCIC_8020/class_list_xfun.txt
[2024/08/09 10:41:58] ppocr INFO: contains_re : False
[2024/08/09 10:41:58] ppocr INFO: order_method : tb-yx
[2024/08/09 10:41:58] ppocr INFO: use_textline_bbox_info : True
[2024/08/09 10:41:58] ppocr INFO: VQATokenPad :
[2024/08/09 10:41:58] ppocr INFO: max_seq_len : 512
[2024/08/09 10:41:58] ppocr INFO: return_attention_mask : True
[2024/08/09 10:41:58] ppocr INFO: VQASerTokenChunk :
[2024/08/09 10:41:58] ppocr INFO: max_seq_len : 512
[2024/08/09 10:41:58] ppocr INFO: Resize :
[2024/08/09 10:41:58] ppocr INFO: size : [224, 224]
[2024/08/09 10:41:58] ppocr INFO: NormalizeImage :
[2024/08/09 10:41:58] ppocr INFO: mean : [123.675, 116.28, 103.53]
[2024/08/09 10:41:58] ppocr INFO: order : hwc
[2024/08/09 10:41:58] ppocr INFO: scale : 1
[2024/08/09 10:41:58] ppocr INFO: std : [58.395, 57.12, 57.375]
[2024/08/09 10:41:58] ppocr INFO: ToCHWImage : None
[2024/08/09 10:41:58] ppocr INFO: KeepKeys :
[2024/08/09 10:41:58] ppocr INFO: keep_keys : ['input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels']
[2024/08/09 10:41:58] ppocr INFO: loader :
[2024/08/09 10:41:58] ppocr INFO: batch_size_per_card : 8
[2024/08/09 10:41:58] ppocr INFO: drop_last : False
[2024/08/09 10:41:58] ppocr INFO: num_workers : 4
[2024/08/09 10:41:58] ppocr INFO: shuffle : False
[2024/08/09 10:41:58] ppocr INFO: Global :
[2024/08/09 10:41:58] ppocr INFO: amp_custom_white_list : ['scale', 'concat', 'elementwise_add']
[2024/08/09 10:41:58] ppocr INFO: cal_metric_during_train : False
[2024/08/09 10:41:58] ppocr INFO: d2s_train_image_shape : [3, 224, 224]
[2024/08/09 10:41:58] ppocr INFO: distributed : False
[2024/08/09 10:41:58] ppocr INFO: epoch_num : 20
[2024/08/09 10:41:58] ppocr INFO: eval_batch_step : [0, 19]
[2024/08/09 10:41:58] ppocr INFO: infer_img : ppstructure/docs/kie/input/zh_val_42.jpg
[2024/08/09 10:41:58] ppocr INFO: kie_det_model_dir : None
[2024/08/09 10:41:58] ppocr INFO: kie_rec_model_dir : None
[2024/08/09 10:41:58] ppocr INFO: log_smooth_window : 10
[2024/08/09 10:41:58] ppocr INFO: pretrained_model : ./pretrained_model/ser_vi_layoutxlm_xfund_pretrained
[2024/08/09 10:41:58] ppocr INFO: print_batch_step : 10
[2024/08/09 10:41:58] ppocr INFO: save_epoch_step : 2000
[2024/08/09 10:41:58] ppocr INFO: save_inference_dir : None
[2024/08/09 10:41:58] ppocr INFO: save_model_dir : ./output/ccic/ser_vi_layoutxlm_xfund_zh
[2024/08/09 10:41:58] ppocr INFO: save_res_path : ./output/ccic/ser/xfund_zh/res
[2024/08/09 10:41:58] ppocr INFO: seed : 2022
[2024/08/09 10:41:58] ppocr INFO: use_gpu : True
[2024/08/09 10:41:58] ppocr INFO: use_visualdl : False
[2024/08/09 10:41:58] ppocr INFO: Loss :
[2024/08/09 10:41:58] ppocr INFO: key : backbone_out
[2024/08/09 10:41:58] ppocr INFO: name : VQASerTokenLayoutLMLoss
[2024/08/09 10:41:58] ppocr INFO: num_classes : 7
[2024/08/09 10:41:58] ppocr INFO: Metric :
[2024/08/09 10:41:58] ppocr INFO: main_indicator : hmean
[2024/08/09 10:41:58] ppocr INFO: name : VQASerTokenMetric
[2024/08/09 10:41:58] ppocr INFO: Optimizer :
[2024/08/09 10:41:58] ppocr INFO: beta1 : 0.9
[2024/08/09 10:41:58] ppocr INFO: beta2 : 0.999
[2024/08/09 10:41:58] ppocr INFO: lr :
[2024/08/09 10:41:58] ppocr INFO: epochs : 20
[2024/08/09 10:41:58] ppocr INFO: learning_rate : 1e-05
[2024/08/09 10:41:58] ppocr INFO: name : Linear
[2024/08/09 10:41:58] ppocr INFO: warmup_epoch : 2
[2024/08/09 10:41:58] ppocr INFO: name : AdamW
[2024/08/09 10:41:58] ppocr INFO: regularizer :
[2024/08/09 10:41:58] ppocr INFO: factor : 0.0
[2024/08/09 10:41:58] ppocr INFO: name : L2
[2024/08/09 10:41:58] ppocr INFO: PostProcess :
[2024/08/09 10:41:58] ppocr INFO: class_path : train_data/XCCIC_8020/class_list_xfun.txt
[2024/08/09 10:41:58] ppocr INFO: name : VQASerTokenLayoutLMPostProcess
[2024/08/09 10:41:58] ppocr INFO: Train :
[2024/08/09 10:41:58] ppocr INFO: dataset :
[2024/08/09 10:41:58] ppocr INFO: data_dir : train_data/XCCIC_8020/zh_train/image
[2024/08/09 10:41:58] ppocr INFO: label_file_list : ['train_data/XCCIC_8020/zh_train/train.json']
[2024/08/09 10:41:58] ppocr INFO: name : SimpleDataSet
[2024/08/09 10:41:58] ppocr INFO: ratio_list : [1.0]
[2024/08/09 10:41:58] ppocr INFO: transforms :
[2024/08/09 10:41:58] ppocr INFO: DecodeImage :
[2024/08/09 10:41:58] ppocr INFO: channel_first : False
[2024/08/09 10:41:58] ppocr INFO: img_mode : RGB
[2024/08/09 10:41:58] ppocr INFO: VQATokenLabelEncode :
[2024/08/09 10:41:58] ppocr INFO: algorithm : LayoutXLM
[2024/08/09 10:41:58] ppocr INFO: class_path : train_data/XCCIC_8020/class_list_xfun.txt
[2024/08/09 10:41:58] ppocr INFO: contains_re : False
[2024/08/09 10:41:58] ppocr INFO: order_method : tb-yx
[2024/08/09 10:41:58] ppocr INFO: use_textline_bbox_info : True
[2024/08/09 10:41:58] ppocr INFO: VQATokenPad :
[2024/08/09 10:41:58] ppocr INFO: max_seq_len : 512
[2024/08/09 10:41:58] ppocr INFO: return_attention_mask : True
[2024/08/09 10:41:58] ppocr INFO: VQASerTokenChunk :
[2024/08/09 10:41:58] ppocr INFO: max_seq_len : 512
[2024/08/09 10:41:58] ppocr INFO: Resize :
[2024/08/09 10:41:58] ppocr INFO: size : [224, 224]
[2024/08/09 10:41:58] ppocr INFO: NormalizeImage :
[2024/08/09 10:41:58] ppocr INFO: mean : [123.675, 116.28, 103.53]
[2024/08/09 10:41:58] ppocr INFO: order : hwc
[2024/08/09 10:41:58] ppocr INFO: scale : 1
[2024/08/09 10:41:58] ppocr INFO: std : [58.395, 57.12, 57.375]
[2024/08/09 10:41:58] ppocr INFO: ToCHWImage : None
[2024/08/09 10:41:58] ppocr INFO: KeepKeys :
[2024/08/09 10:41:58] ppocr INFO: keep_keys : ['input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels']
[2024/08/09 10:41:58] ppocr INFO: loader :
[2024/08/09 10:41:58] ppocr INFO: batch_size_per_card : 8
[2024/08/09 10:41:58] ppocr INFO: drop_last : False
[2024/08/09 10:41:58] ppocr INFO: num_workers : 4
[2024/08/09 10:41:58] ppocr INFO: shuffle : True
[2024/08/09 10:41:58] ppocr INFO: profiler_options : None
[2024/08/09 10:41:58] ppocr INFO: train with paddle 2.5.2 and device Place(gpu:0)
[2024/08/09 10:41:58] ppocr INFO: Initialize indexs of datasets:['train_data/XCCIC_8020/zh_train/train.json']
list index out of range
[2024-08-09 10:41:59,583] [ INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/transformers/layoutxlm_base/sentencepiece.bpe.model and saved to /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased
[2024-08-09 10:41:59,640] [ INFO] - Downloading sentencepiece.bpe.model from https://bj.bcebos.com/paddlenlp/models/transformers/layoutxlm_base/sentencepiece.bpe.model
100%|██████████████████████████████████████| 4.83M/4.83M [00:00<00:00, 5.25MB/s]
[2024-08-09 10:42:01,488] [ INFO] - tokenizer config file saved in /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased/tokenizer_config.json
[2024-08-09 10:42:01,488] [ INFO] - Special tokens file saved in /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased/special_tokens_map.json
[2024/08/09 10:42:01] ppocr INFO: Initialize indexs of datasets:['train_data/XCCIC_8020/zh_val/val.json']
[2024-08-09 10:42:01,490] [ INFO] - Already cached /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased/sentencepiece.bpe.model
[2024-08-09 10:42:02,249] [ INFO] - tokenizer config file saved in /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased/tokenizer_config.json
[2024-08-09 10:42:02,249] [ INFO] - Special tokens file saved in /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased/special_tokens_map.json
[2024-08-09 10:42:02,252] [ INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/transformers/vi-layoutxlm-base-uncased/model_state.pdparams and saved to /home/aistudio/.paddlenlp/models/vi-layoutxlm-base-uncased
[2024-08-09 10:42:02,252] [ INFO] - Downloading model_state.pdparams from https://bj.bcebos.com/paddlenlp/models/transformers/vi-layoutxlm-base-uncased/model_state.pdparams
100%|██████████████████████████████████████| 1.04G/1.04G [00:13<00:00, 80.3MB/s]
W0809 10:42:16.289948 80856 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.0, Runtime API Version: 11.8
W0809 10:42:16.291229 80856 gpu_resources.cc:149] device: 0, cuDNN Version: 8.9.
[2024-08-09 10:42:19,987] [ INFO] - Weights of LayoutXLMForTokenClassification not initialized from pretrained model: ['classifier.weight', 'classifier.bias']
[2024/08/09 10:42:20] ppocr INFO: train dataloader has 18 iters
[2024/08/09 10:42:20] ppocr INFO: valid dataloader has 5 iters
Additional
No response
Are you willing to submit a PR?
- [ ] Yes I'd like to help by submitting a PR!