s3-ocr
s3-ocr copied to clipboard
s3-ocr file command to process a single PDF
Would still require a bucket since PDFs through Textract need to go through a bucket.
Maybe has an option to block and poll for completion?
Default operation can be to put the object to the bucket and then start an OCR run against it.
Can use the same filename, but return an error if a file of that name exists already.
s3-ocr file my-bucket document.pdf
Default mode outputs a message saying that the file has been uploaded and put in the OCR queue.
Option --wait
waits for it to complete and then returns the text version of the OCR.
--wait --json
blocks and then returns the output of fetch --combine
to standard output.