s3-ocr icon indicating copy to clipboard operation
s3-ocr copied to clipboard

s3-ocr file command to process a single PDF

Open simonw opened this issue 2 years ago • 1 comments

Would still require a bucket since PDFs through Textract need to go through a bucket.

Maybe has an option to block and poll for completion?

Default operation can be to put the object to the bucket and then start an OCR run against it.

Can use the same filename, but return an error if a file of that name exists already.

simonw avatar Jun 30 '22 01:06 simonw

s3-ocr file my-bucket document.pdf

Default mode outputs a message saying that the file has been uploaded and put in the OCR queue.

Option --wait waits for it to complete and then returns the text version of the OCR.

--wait --json blocks and then returns the output of fetch --combine to standard output.

simonw avatar Jun 30 '22 01:06 simonw