PaddleOCR icon indicating copy to clipboard operation
PaddleOCR copied to clipboard

added sliding window for large image inference

Open aspaul20 opened this issue 1 year ago • 11 comments

PaddleOCR does not work on large documents/images, this feature consists of a sliding window inference method, which although takes longer (expectedly), uses a sliding window to create slices of the input image and run detection+recognition on it. Unlike the default code, it gives correct results. The vertical and horizontal strides are adjustable by the user.

Output on an image of dimensions (5088x3600):

Without sliding window: Screenshot from 2024-05-21 17-06-11

With sliding window: Screenshot from 2024-05-21 17-07-11

Note: It could use a postprocessing step where the adjacent detections are merged into one, if needed.

aspaul20 avatar May 21 '24 12:05 aspaul20

Thanks for your contribution!

paddle-bot[bot] avatar May 21 '24 12:05 paddle-bot[bot]

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar May 21 '24 12:05 CLAassistant

when using slideing window op, how to avoid word fragmentation?

Word fragmentation can be minimized with adjusting the stride values and a postprocessing step where adjacent detections are merged. In the example above,

'Santa Monic' and 'a Music Camp' would be merged into one with an extended bounding box and text that says 'Santa Monica Music Camp'.

aspaul20 avatar May 22 '24 04:05 aspaul20

hi @aspaul20, maybe you need to add the post-processing part as well to avoid individual words being separated.

GreatV avatar May 22 '24 04:05 GreatV

hi @aspaul20, maybe you need to add the post-processing part as well to avoid individual words being separated.

Hi @GreatV, added a fix for word fragmentation

test

aspaul20 avatar May 22 '24 08:05 aspaul20

I haven't tested it yet, I'm more curious if this fix is valid for the following picture?

2405 04788v1_1

GreatV avatar May 23 '24 06:05 GreatV

@aspaul20 you may need install pre-commit and run pre-commit run --all-files

GreatV avatar May 23 '24 06:05 GreatV

I haven't tested it yet, I'm more curious if this fix is valid for the following picture?

2405 04788v1_1

Sure, here's what the output looks like for

slice = {'horizontal_stride': 300, 'vertical_stride':500, 'merge_x_thres':50, 'merge_y_thres': 35}

test

Although it works well, the slicing operator finds its best use in even larger images, for instance if this paper were stacked on top of each other and you wanted to run OCR on it.

Here's an example

PS. I improved the merging code a little further

aspaul20 avatar May 23 '24 08:05 aspaul20

@aspaul20, This looks great! I'll take some time to review the code further. In the meantime, could you add some documentation to help users understand how to use slice operation?

GreatV avatar May 23 '24 09:05 GreatV

And you may need to fix the Contributor License Agreement (CLA) check.

GreatV avatar May 23 '24 09:05 GreatV

@aspaul20, This looks great! I'll take some time to review the code further. In the meantime, could you add some documentation to help users understand how to use slice operation?

Of course, I should have some documentation for you soon!

aspaul20 avatar May 23 '24 09:05 aspaul20

@aspaul20 Thanks for your contribution! You will receive a beautiful PaddlePaddle gift. Please provide your mailing address by filling out the following questionnaire before October 18th.

Looking forward to the future, we will walk further together in the world of open source! Click Here :https://paddle.wjx.cn/vm/h4On9gJ.aspx#

luotao1 avatar Oct 15 '24 06:10 luotao1