PaddleOCR
PaddleOCR copied to clipboard
DB_MV3 loss wont decrease beyond a certain point
问题描述 / Problem Description
I am trying to overfit the db_mv3 model on a single scanned A4 size image, with fairly straightforward annotation. (The default Paddle detector performs almost perfectly when I run inference on it with default settings, but the outputs are sentence-wise, whereas I need character/word level outputs). So I've tried labeling the image with character level annotations and got the following results.
Mobilenet backbone (3600 epochs, char-level annotations)
[2024/07/10 19:51:02] ppocr INFO: metric eval ***************
[2024/07/10 19:51:02] ppocr INFO: precision:0.677007299270073
[2024/07/10 19:51:02] ppocr INFO: recall:0.25255275697753576
[2024/07/10 19:51:02] ppocr INFO: hmean:0.3678730788299455
[2024/07/10 19:51:02] ppocr INFO: fps:4.403861761942011
Resnet50 Backbone (776 epochs)
[2024/07/10 19:48:29] ppocr INFO: metric eval ***************
[2024/07/10 19:48:29] ppocr INFO: precision:0.5193798449612403
[2024/07/10 19:48:29] ppocr INFO: recall:0.1824370319945541
[2024/07/10 19:48:29] ppocr INFO: hmean:0.2700251889168766
[2024/07/10 19:48:29] ppocr INFO: fps:4.3201766673808075
Then I retrained with a mix word-level annotations and some character-level labels, that didn't help much either.
Mobilenet backbone (3000 epochs, char+word level mix annotations)
[2024/07/10 21:06:45] ppocr INFO: metric eval ***************
[2024/07/10 21:06:45] ppocr INFO: precision:0.37731958762886597
[2024/07/10 21:06:45] ppocr INFO: recall:0.2846034214618974
[2024/07/10 21:06:45] ppocr INFO: hmean:0.32446808510638303
[2024/07/10 21:06:45] ppocr INFO: fps:4.316308784091119
From my observation of the training logs, the loss seems to be stuck at some point (either around 1.0 or 0.8) and does not drop any further. I was thinking maybe its stuck in a local minima or something, but there doesn't seem to be an option to use techniques like momentum, etc. So can someone please help me figure this out or give me tips to improve my training?
PS. I cannot increase the dataset size (or share it since its confidential, but you can imagine a fairly easy to detect A4 scan of a normal quality page). Any help is welcome. Thank you!