sycamore
sycamore copied to clipboard
General Enquiry : Does this tool takes care of table extraction and borderless table extraction from a pdf file?
General Enquiry : Does this tool takes care of table extraction and borderless table extraction from a pdf file?
and then do meaningful chunking to send it to any system?
Thanks RamDa
At the moment this uses Amazon Textract to perform this operation (see here). We're working on a solution that doesn't require AWS credentials.
Thanks @HenryL27 , this means it supports tables and borderless/merged cells table from a PDF file. When this new version will be available to try?
Thanks RamDa
@ramda1234786 we got our own table extraction now!
ctx = sycamore.init()
ctx.read.binary(binary_format="pdf", paths=paths)
.partition(partitioner=SycamorePartitioner(extract_table_structure=True))
... # continue processing