sycamore icon indicating copy to clipboard operation
sycamore copied to clipboard

General Enquiry : Does this tool takes care of table extraction and borderless table extraction from a pdf file?

Open ramda1234786 opened this issue 1 year ago • 2 comments

General Enquiry : Does this tool takes care of table extraction and borderless table extraction from a pdf file?

and then do meaningful chunking to send it to any system?

Thanks RamDa

ramda1234786 avatar Jan 15 '24 10:01 ramda1234786

At the moment this uses Amazon Textract to perform this operation (see here). We're working on a solution that doesn't require AWS credentials.

HenryL27 avatar Jan 24 '24 18:01 HenryL27

Thanks @HenryL27 , this means it supports tables and borderless/merged cells table from a PDF file. When this new version will be available to try?

Thanks RamDa

ramda1234786 avatar Feb 03 '24 04:02 ramda1234786

@ramda1234786 we got our own table extraction now!

ctx = sycamore.init()
ctx.read.binary(binary_format="pdf", paths=paths)
    .partition(partitioner=SycamorePartitioner(extract_table_structure=True))
    ... # continue processing

HenryL27 avatar Jun 07 '24 21:06 HenryL27