PaddleOCR
PaddleOCR copied to clipboard
code bug while training parseq in paddleOCR
🔎 Search before asking
- [X] I have searched the PaddleOCR Docs and found no similar bug report.
- [X] I have searched the PaddleOCR Issues and found no similar bug report.
- [X] I have searched the PaddleOCR Discussions and found no similar bug report.
🐛 Bug (问题描述)
I encountered an error while trying to reproduce parseq based on PaddleOCR. It seems to be a bug in the code. Please take a look at the specific information below: Here is the config file:
Global:
use_gpu: True
epoch_num: 100
log_smooth_window: 20
print_batch_step: 5
save_model_dir: ./output/rec/parseq_cty_v1
save_epoch_step: 3
eval_batch_step: [0, 500]
cal_metric_during_train: True
pretrained_model:
checkpoints:
save_inference_dir:
use_visualdl: False
infer_img: doc/imgs_words_en/word_10.png
character_dict_path: ppocr/utils/dict/parseq_dict_mixlang.txt
character_type: ch
max_text_length: 35 # 35
num_heads: 8
infer_mode: False
use_space_char: False
save_res_path: ./output/rec/predicts_parseq.txt
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: OneCycle
max_lr: 0.0007
Architecture:
model_type: rec
algorithm: ParseQ
in_channels: 3
Transform:
Backbone:
name: ViTParseQ
img_size: [32, 128]
patch_size: [4, 8]
embed_dim: 384
depth: 12
num_heads: 6
mlp_ratio: 4
in_channels: 3
Head:
name: ParseQHead
# Architecture
max_text_length: 35
embed_dim: 384
dec_num_heads: 12
dec_mlp_ratio: 4
dec_depth: 1
# Training
perm_num: 6
perm_forward: true
perm_mirrored: true
dropout: 0.1
# Decoding mode (test)
decode_ar: true
refine_iters: 1
Loss:
name: ParseQLoss
PostProcess:
name: ParseQLabelDecode
Metric:
name: RecMetric
main_indicator: acc
is_filter: True
Train:
dataset:
name: LMDBDataSet
data_dir: /mnt/workspace/workgroup/sukunming/code/parseq/data/train/synth
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- ParseQRecAug:
aug_type: 0 # or 1
- ParseQLabelEncode:
- SVTRRecResizeImg:
image_shape: [3, 32, 128]
padding: False
- KeepKeys:
keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order
loader:
shuffle: True
batch_size_per_card: 192
drop_last: True
num_workers: 4
Eval:
dataset:
name: LMDBDataSet
data_dir: /mnt/workspace/workgroup/sukunming/code/parseq/data/val_label_data/synth
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- ParseQLabelEncode: # Class handling label
- SVTRRecResizeImg:
image_shape: [3, 32, 128]
padding: False
- KeepKeys:
keep_keys: ['image', 'label', 'length']
loader:
shuffle: False
drop_last: False
batch_size_per_card: 384
num_workers: 4
Here is the what 'data_dir' looks like, each folder includes two file: 'data.mdb' and 'lock.mdb', which are generated by 'python tools/create_lmdb_dataset.py /path/to/img/root /path/to/gt /path/to/save/lmdb':
Based on infos above, I run order "python3 tools/train.py -c configs/rec/rec_vit_parseq_cty_v1.yml" and encountered a bug at 'ppocr/modeling/heads/rec_parseq_head.py' Line 498:
where targets[0]:
targets[1]:
And:
Is there something wrong with the code? How to solve the problem? Thank you a lot! Regarding the above problem, it should be caused by the incorrect use of index [0] for a scalar. On the one hand, I think that as an official maintainer, you generally won't make such a simple mistake; but on the other hand, it did happen, which is very strange. Please help solve this problem, thank you!
🏃♂️ Environment (运行环境)
(base) /mnt/workspace/workgroup/sukunming/code/parseq/data/val_label_data/synth> uname -a Linux dsw84519-5b9bbbb4d-mwmw2 5.10.112-005.ali5000.al8.x86_64 #1 SMP Tue Jun 28 10:43:38 CST 2022 x86_64 x86_64 x86_64 GNU/Linux
(base) /mnt/workspace/workgroup/sukunming/code/parseq/data/val_label_data/synth> pip list Package Version
addict 2.4.0 aiohttp 3.9.1 aiosignal 1.3.1 albucore 0.0.13 albumentations 1.4.14 alibabacloud-credentials 0.3.2 alibabacloud-endpoint-util 0.0.3 alibabacloud-gateway-spi 0.0.1 alibabacloud-openapi-util 0.2.2 alibabacloud-pai-dlc20201203 1.0.0 alibabacloud-paistudio20220112 1.1.2 alibabacloud-tea 0.3.5 alibabacloud-tea-openapi 0.3.8 alibabacloud-tea-util 0.3.11 alibabacloud-tea-xml 0.0.2 alipai 0.1.7 aliyun-log-python-sdk 0.8.15 aliyun-python-sdk-core 2.14.0 aliyun-python-sdk-kms 2.16.2 aliyun-python-sdk-sts 3.1.2 annotated-types 0.7.0 astor 0.8.1 astroid 3.0.2 asttokens 2.4.1 attrs 23.1.0 autopep8 1.7.0 boltons 23.0.0 brotlipy 0.7.0 cachetools 5.3.2 certifi 2023.11.17 cffi 1.15.1 charset-normalizer 2.0.4 cloudpickle 3.0.0 colorama 0.4.6 comm 0.2.1 common-io 0.4.0+tunnel conda 23.9.0 conda-content-trust 0.2.0 conda-libmamba-solver 23.9.1 conda-package-handling 2.2.0 conda_package_streaming 0.9.0 configparser 6.0.0 contextlib2 21.6.0 contourpy 1.2.0 crcmod 1.7 cryptography 37.0.4 cvxopt 1.3.2 cycler 0.12.1 Cython 3.0.6 datasets 2.16.1 dateparser 1.2.0 debugpy 1.8.0 decorator 5.1.1 dill 0.3.7 dnspython 2.4.2 eas-prediction 0.12 easy-rec 0.1.6 einops 0.7.0 elastic-transport 8.11.0 elasticsearch 8.11.1 eval_type_backport 0.2.0 executing 2.0.1 fairscale 0.4.13 filelock 3.13.1 flake8 7.0.0 fonttools 4.47.0 frozenlist 1.4.1 fsspec 2023.12.2 future 0.18.3 gast 0.5.4 graphviz 0.20.1 huggingface-hub 0.20.2 hyperopt 0.1.2 idna 3.4 imageio 2.35.1 importlib-metadata 7.0.1 ipykernel 6.28.0 ipython 8.20.0 ipywidgets 8.1.1 isort 5.13.2 jedi 0.19.1 Jinja2 3.1.2 jmespath 0.10.0 joblib 1.3.2 json-tricks 3.17.3 jsonpatch 1.32 jsonpointer 2.1 jupyter_client 8.6.0 jupyter_core 5.7.1 jupyterlab-widgets 3.0.9 kiwisolver 1.4.5 lazy_loader 0.4 lazy-object-proxy 1.6.0 libmambapy 1.5.1 lightning-utilities 0.11.6 MarkupSafe 2.1.3 matplotlib 3.8.2 matplotlib-inline 0.1.6 mccabe 0.7.0 modelscope 1.11.0 mpmath 1.3.0 multidict 6.0.4 multiprocess 0.70.15 nest-asyncio 1.5.9 networkx 3.2.1 numpy 1.26.2 opencv-contrib-python 4.6.0.66 opencv-python 4.6.0.66 opencv-python-headless 4.10.0.84 opt-einsum 3.3.0 oss2 2.18.3 packaging 23.1 pai-nni 2.6 pandas 2.1.4 parso 0.8.3 patsy 0.5.5 pexpect 4.9.0 pillow 10.2.0 pip 23.3.1 platformdirs 4.1.0 plotly 5.18.0 pluggy 1.0.0 prettytable 3.9.0 prompt-toolkit 3.0.43 protobuf 3.20.3 psutil 5.9.7 ptyprocess 0.7.0 pure-eval 0.2.2 pyarrow 14.0.2 pyarrow-hotfix 0.6 pybind11 2.10.4 pybind11-global 2.10.4 pycodestyle 2.11.1 pycosat 0.6.6 pycparser 2.21 pycryptodome 3.19.0 pydantic 2.8.2 pydantic_core 2.20.1 pyflakes 3.2.0 Pygments 2.17.2 pylint 3.0.3 pymongo 4.6.1 pyodps 0.11.4.1 pyOpenSSL 23.2.0 pyparsing 3.1.1 PySocks 1.7.1 python-dateutil 2.8.2 PythonWebHDFS 0.2.3 pytorch-lightning 2.4.0 pytz 2023.3.post1 PyYAML 6.0.1 pyzmq 25.1.2 regex 2023.12.25 requests 2.31.0 responses 0.24.1 ruamel.yaml 0.17.21 safetensors 0.4.4 schema 0.7.5 scikit-image 0.24.0 scikit-learn 1.3.2 scipy 1.11.4 seaborn 0.13.0 setuptools 68.2.2 simplejson 3.19.2 six 1.16.0 sortedcontainers 2.4.0 stack-data 0.6.3 statsmodels 0.14.1 sympy 1.12 tabulate 0.9.0 tenacity 8.2.3 terminado 0.8.3 threadpoolctl 3.2.0 tifffile 2024.8.10 timm 1.0.8 toml 0.10.2 tomli 2.0.1 tomlkit 0.12.3 torch 2.1.0+cu118 torchaudio 2.1.0+cu118 torchmetrics 1.4.1 torchvision 0.16.0+cu118 tornado 6.4 tqdm 4.66.1 training-utils 1.0.6 traitlets 5.14.1 triton 2.1.0 truststore 0.8.0 typing_extensions 4.9.0 tzdata 2023.3 tzlocal 5.2 urllib3 1.26.16 wcwidth 0.2.13 websockets 12.0 wheel 0.41.2 widgetsnbextension 4.0.9 xgboost 2.0.3 xlrd 2.0.1 xxhash 3.4.1 yapf 0.40.2 yarl 1.9.4 zipp 3.17.0 zstandard 0.19.0
🌰 Minimal Reproducible Example (最小可复现问题的Demo)
Sorry, the network is blocked on the following page: