DAVAR-Lab-OCR icon indicating copy to clipboard operation
DAVAR-Lab-OCR copied to clipboard

lgpma train slow

Open gulixin0922 opened this issue 2 years ago • 3 comments

hello, I used 8 V100 GPUS to train lgpma on pubtabnet datasets. however training is so slow, almost need 8 days. Is any suggestion?

屏幕快照 2022-05-19 下午5 10 24

cpu useage 屏幕快照 2022-05-19 下午5 11 32

gulixin0922 avatar May 19 '22 09:05 gulixin0922

Training current model on pubtabnet indeed costs 8 days for 12 epochs (as shown in the log we provided). It is mainly because too many samples in PubTabNet. If you want to train the model on other datasets, you could directly load the trained model and finetune it for few epochs.

If you want to have a faster speed on PubTabNet to verify the correctness, you may reduce the model size or memory cost by using small backbone/ anchor numbers/img_scale. (The performance will somehow be affected a little).

qiaoliang6 avatar May 19 '22 14:05 qiaoliang6

Thank you for your reply! I will try to use small backbone/ anchor numbers/img_scale.

Can you provide resnet50 pretrained model('path/to/resnet50-19c8e357.pth')?I have download resnet50 pretrained model from mmclassification. However some variables' name are not correct.

gulixin0922 avatar May 20 '22 02:05 gulixin0922

Hi gulixin0922, I am training lgpma model but i keep getting error: simple_test() got an unexpected keyword argument 'gt_bboxes' Could you please share me the config file: lgpma_pub.py and lgpma_base.py which you used to train pubtabnet dataset? Thank you very much

mrtranducdung avatar Aug 24 '22 04:08 mrtranducdung