DAVAR-Lab-OCR icon indicating copy to clipboard operation
DAVAR-Lab-OCR copied to clipboard

单卡训练如何设置参数

Open tianyongliu opened this issue 1 year ago • 1 comments

我手头暂时只有一张卡,想请问下如何改动训练参数? 麻烦给个demo,谢谢!

另外,训练OCR模型的数据: /root/DAVAR-Lab-OCR-main/demo/table_recognition/lgpma/configs/ocr_models/rcg_res32_bilstm_attn_pubtabnet_sensitive.py

ann_file='path/to/PubTabNet/recog/datalist_recog_val.json', img_prefix='path/to/PubTabNet/recog/',

ann_file='path/to/PubTabNet/recog/datalist_recog_val.json', img_prefix='path/to/PubTabNet/recog/',

这里的训练数据可以哪里下载或是如何制作?

谢谢!

tianyongliu avatar Mar 03 '23 01:03 tianyongliu

但是我全部用样例数据训练完,报错: 这是lgpma_pub.py 的部分配置:

base = "./lgpma_base.py"

data = dict( samples_per_gpu=3, workers_per_gpu=1, train=dict( ann_file='/root/davar_datalist_example/davar_datalist_demo.json', img_prefix='/root/davar_datalist_example/images'), # According to the evaluation metric, select the appropriate validation dataset format. val=dict( ann_file='/root/davar_datalist_example/davar_datalist_demo.json', img_prefix='/root/davar_datalist_example/images'), test=dict( samples_per_gpu=1, ann_file='/root/davar_datalist_example/davar_datalist_demo.json', img_prefix='/root/davar_datalist_example/images') )

这是打印信息:

2023-03-03 11:15:32,466 - davarocr - INFO - Saving checkpoint at 1 epochs 2023-03-03 11:15:41,018 - davarocr - INFO - Saving checkpoint at 2 epochs 2023-03-03 11:15:49,519 - davarocr - INFO - Saving checkpoint at 3 epochs 2023-03-03 11:15:59,133 - davarocr - INFO - Saving checkpoint at 4 epochs 2023-03-03 11:16:08,227 - davarocr - INFO - Saving checkpoint at 5 epochs 2023-03-03 11:16:17,693 - davarocr - INFO - Saving checkpoint at 6 epochs 2023-03-03 11:16:26,505 - davarocr - INFO - Saving checkpoint at 7 epochs 2023-03-03 11:16:34,652 - davarocr - INFO - Saving checkpoint at 8 epochs 2023-03-03 11:16:43,591 - davarocr - INFO - Saving checkpoint at 9 epochs 2023-03-03 11:16:52,845 - davarocr - INFO - Saving checkpoint at 10 epochs 2023-03-03 11:17:00,441 - davarocr - INFO - Saving checkpoint at 11 epochs 2023-03-03 11:17:09,522 - davarocr - INFO - Saving checkpoint at 12 epochs Traceback (most recent call last): File "/root/miniconda/envs/d2/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/root/miniconda/envs/d2/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/root/miniconda/envs/d2/lib/python3.7/site-packages/torch/distributed/launch.py", line 260, in main() File "/root/miniconda/envs/d2/lib/python3.7/site-packages/torch/distributed/launch.py", line 256, in main cmd=cmd) subprocess.CalledProcessError: Command '['/root/miniconda/envs/d2/bin/python', '-u', '/root/workspace/DAVAR-Lab-OCR-main/tools/train.py', '--local_rank=0', './configs/lgpma_pub.py', '--no-validate', '--launcher', 'pytorch']' died with <Signals.SIGSEGV: 11>.

tianyongliu avatar Mar 03 '23 03:03 tianyongliu