amazon-textract-textractor
amazon-textract-textractor copied to clipboard
How can I order the results as shown in the pdf?

Example : python3 textractor.py --documents s3://mybucket/mydoc.pdf --forms
Result :
62692bb61ab53-pdf-page-1-forms.csv

how can i order this way

@robertdac What kind of ordering do you like to apply? Do you need to use the command line or can you also write your own python code?
Cheers Tobias
Did you check the output like described on the page https://github.com/aws-samples/amazon-textract-textractor "document-page-n-text-inreadingorder.txt: Detected text in reading order (multi-column) for each page in the document."
Hi @tb102122 I'm facing a similar issue with key values exported as csv using a python script.
The checkboxes and key values do not appear to be in any specific order. As there are quite a few duplicate checkboxes (e.g Yes/No) was hoping to be able to format if possible left to right, top to bottom.
document = extractor.start_document_analysis( file_source=("Application Form trimmed.pdf"), features=[TextractFeatures.FORMS], s3_upload_path="s3://textractbucket2/" ) document.export_kv_to_csv( include_kv=True, include_checkboxes=True, filepath="async_kv.csv" )
@syley that example should help you but in your case it sounds a but more complex. https://github.com/aws-samples/amazon-textract-textractor/blob/master/tpipelinegeofinder/geofinder-sample-notebook.ipynb