DAVAR-Lab-OCR lgpma train slow

lgpma train slow

Open gulixin0922 opened this issue 2 years ago • 3 comments

hello, I used 8 V100 GPUS to train lgpma on pubtabnet datasets. however training is so slow, almost need 8 days. Is any suggestion？

屏幕快照 2022-05-19 下午5 10 24

cpu useage 屏幕快照 2022-05-19 下午5 11 32

May 19 '22 09:05 gulixin0922

Training current model on pubtabnet indeed costs 8 days for 12 epochs (as shown in the log we provided). It is mainly because too many samples in PubTabNet. If you want to train the model on other datasets, you could directly load the trained model and finetune it for few epochs.

If you want to have a faster speed on PubTabNet to verify the correctness, you may reduce the model size or memory cost by using small backbone/ anchor numbers/img_scale. (The performance will somehow be affected a little).

May 19 '22 14:05 qiaoliang6

Thank you for your reply! I will try to use small backbone/ anchor numbers/img_scale.

Can you provide resnet50 pretrained model（'path/to/resnet50-19c8e357.pth'）？I have download resnet50 pretrained model from mmclassification. However some variables' name are not correct.

May 20 '22 02:05 gulixin0922

Hi gulixin0922, I am training lgpma model but i keep getting error: simple_test() got an unexpected keyword argument 'gt_bboxes' Could you please share me the config file: lgpma_pub.py and lgpma_base.py which you used to train pubtabnet dataset? Thank you very much

Aug 24 '22 04:08 mrtranducdung

DAVAR-Lab-OCR DAVAR-Lab-OCR copied to clipboard

lgpma train slow

DAVAR-Lab-OCR
DAVAR-Lab-OCR copied to clipboard