ragflow icon indicating copy to clipboard operation
ragflow copied to clipboard

[Question]: How can I extract text automatically segmented as per the layout

Open shubhamworks opened this issue 1 year ago • 3 comments

Describe your problem

If there is a pdf with 2 columns with headings and tables. I want to extract the text/OCR result separately for individual layout segments. How can I do it directly just by using deepdoc?

shubhamworks avatar Apr 08 '24 10:04 shubhamworks

You need to use layout recognizer. Please look into code in rag/app. May this help.

Thanks for following

KevinHuSh avatar Apr 08 '24 13:04 KevinHuSh

Layout recogniser only returns the layout (bounding box and corresponding label). However it doesn't return the text data in that box. Any direct function or code for that?

shubhamworks avatar Apr 08 '24 17:04 shubhamworks

This function is for this purpose. image

KevinHuSh avatar Apr 09 '24 01:04 KevinHuSh