azure-search-openai-demo
azure-search-openai-demo copied to clipboard
form recognizer begin_analyze_document timeout on large files
Please provide us with the following information:
This issue is for a: (mark with an x)
- [ X] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)
Minimal steps to reproduce
Attempt to process a document of 390 or more pages
Any log messages given by the failure
Uploading blob for page 390 -> PA - Sch 23 - Extracts from Proposal-390.pdf
Extracting text from 'C:\Users\dboyd\Documents\DesignSpecs/data\PA - Sch 23 - Extracts from Proposal.pdf' using Azure Form Recognizer
Traceback (most recent call last):
File "C:\Users\dboyd\Documents\DesignSpecs\scripts\prepdocs.py", line 379, in
Expected/desired behavior
Need to be able to set a longer timeout for large files in the being_analyze_document call.
OS and Version?
Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?) Windows 10
azd version?
run
azd versionand copy paste here. azd version 1.1.0 (commit ea9cb12575734ee6a5f99c4d415c1a51d6f32d3e)
Versions
Mention any other details that might be useful
THe below is the code that is timing out: with open(filename, "rb") as f: poller = form_recognizer_client.begin_analyze_document("prebuilt-layout", document = f)
Given that the entire bytestream of the large file has to be sent to the endpoint this looks like a straight HTTP timeout. However, there is no place in the API documentation to change the timeout for the begin_analyze_document call.
I do not believe that re-writing the example to use async IO will work as this is an endpoint timeout.
Thanks! We'll be in touch soon.
I also don't see anything in the docs to extend the timeout: https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/formrecognizer/azure-ai-formrecognizer/README.md
You could log an issue in the azure-sdk-for-python repo about this to see if they have any feedback. However, it may just be a limitation of the underlying API. So a workaround would be to preprocess the PDF to split it into smaller documents.
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this issue will be closed.
I am planning to build an app with Azure Document Intelligence and while testing the capabilities of this service, I also found this issue when trying to convert a large file. Looks like this is not a priority, perhaps I can split the PDF prior to sending it,,,
Is there any update on this? I am getting the following error when trying to analyze a pdf of 5MB:
"azure.core.exceptions.HttpResponseError: (Timeout) The operation was timeout. Code: Timeout Message: The operation was timeout."
I'd rather not have to split the document into smaller chunks beforehand. Any ideas / solutions?
I'm encountering the same error with the REST API.
{ "error": { "code": "Timeout", "message": "The operation was timeout." } }
+1.
The only solution seems to be adding more document intelligence services and splitting up the doc into smaller chunks, which isn't a great solution. Would love a timeout or parallelism functionality.
Hi all, thanks for the feedback. I've created an issue in our Azure SDK repo and we'll investigate ASAP.
Is anyone on the thread able to share a PDF that resulted in a timeout? If so, please email to pamelafox at microsoft . com
Is anyone on the thread able to share a PDF that resulted in a timeout? If so, please email to pamelafox at microsoft . com
@pamelafox Please check your inbox as I have sent you a sample file to reproduce this issue. Furthermore, this issue occurs when using the Markdown output format.