PaddleOCR
PaddleOCR copied to clipboard
added sliding window for large image inference
PaddleOCR does not work on large documents/images, this feature consists of a sliding window inference method, which although takes longer (expectedly), uses a sliding window to create slices of the input image and run detection+recognition on it. Unlike the default code, it gives correct results. The vertical and horizontal strides are adjustable by the user.
Output on an image of dimensions (5088x3600):
Without sliding window:
With sliding window:
Note: It could use a postprocessing step where the adjacent detections are merged into one, if needed.
Thanks for your contribution!
when using
slideing windowop, how to avoid word fragmentation?
Word fragmentation can be minimized with adjusting the stride values and a postprocessing step where adjacent detections are merged. In the example above,
'Santa Monic' and 'a Music Camp' would be merged into one with an extended bounding box and text that says 'Santa Monica Music Camp'.
hi @aspaul20, maybe you need to add the post-processing part as well to avoid individual words being separated.
hi @aspaul20, maybe you need to add the post-processing part as well to avoid individual words being separated.
Hi @GreatV, added a fix for word fragmentation
I haven't tested it yet, I'm more curious if this fix is valid for the following picture?
@aspaul20 you may need install pre-commit and run pre-commit run --all-files
I haven't tested it yet, I'm more curious if this fix is valid for the following picture?
Sure, here's what the output looks like for
slice = {'horizontal_stride': 300, 'vertical_stride':500, 'merge_x_thres':50, 'merge_y_thres': 35}
Although it works well, the slicing operator finds its best use in even larger images, for instance if this paper were stacked on top of each other and you wanted to run OCR on it.
Here's an example
PS. I improved the merging code a little further
@aspaul20, This looks great! I'll take some time to review the code further. In the meantime, could you add some documentation to help users understand how to use slice operation?
And you may need to fix the Contributor License Agreement (CLA) check.
@aspaul20, This looks great! I'll take some time to review the code further. In the meantime, could you add some documentation to help users understand how to use
sliceoperation?
Of course, I should have some documentation for you soon!
@aspaul20 Thanks for your contribution! You will receive a beautiful PaddlePaddle gift. Please provide your mailing address by filling out the following questionnaire before October 18th.
Looking forward to the future, we will walk further together in the world of open source! Click Here :https://paddle.wjx.cn/vm/h4On9gJ.aspx#
