azure-search-openai-demo form recognizer begin_analyze

trafficstars

Please provide us with the following information:

This issue is for a: (mark with an `x`)

- [ X] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Attempt to process a document of 390 or more pages

Any log messages given by the failure

    Uploading blob for page 390 -> PA - Sch 23 - Extracts from Proposal-390.pdf

Extracting text from 'C:\Users\dboyd\Documents\DesignSpecs/data\PA - Sch 23 - Extracts from Proposal.pdf' using Azure Form Recognizer Traceback (most recent call last): File "C:\Users\dboyd\Documents\DesignSpecs\scripts\prepdocs.py", line 379, in page_map = get_document_text(filename) File "C:\Users\dboyd\Documents\DesignSpecs\scripts\prepdocs.py", line 111, in get_document_text poller = form_recognizer_client.begin_analyze_document("prebuilt-layout", document = f) File "C:\Users\dboyd\Documents\DesignSpecs\scripts.venv\lib\site-packages\azure\core\tracing\decorator.py", line 76, in wrapper_use_tracer return func(*args, **kwargs) File "C:\Users\dboyd\Documents\DesignSpecs\scripts.venv\lib\site-packages\azure\ai\formrecognizer_document_analysis_client.py", line 126, in begin_analyze_document return self._client.begin_analyze_document( # type: ignore File "C:\Users\dboyd\Documents\DesignSpecs\scripts.venv\lib\site-packages\azure\ai\formrecognizer_generated_operations_mixin.py", line 170, in begin_analyze_document return mixin_instance.begin_analyze_document(model_id, pages, locale, string_index_type, analyze_request, **kwargs) File "C:\Users\dboyd\Documents\DesignSpecs\scripts.venv\lib\site-packages\azure\core\tracing\decorator.py", line 76, in wrapper_use_tracer return func(*args, **kwargs) File "C:\Users\dboyd\Documents\DesignSpecs\scripts.venv\lib\site-packages\azure\ai\formrecognizer_generated\v2022_08_31\operations_form_recognizer_client_operations.py", line 576, in begin_analyze_document raw_result = self._analyze_document_initial( # type: ignore File "C:\Users\dboyd\Documents\DesignSpecs\scripts.venv\lib\site-packages\azure\ai\formrecognizer_generated\v2022_08_31\operations_form_recognizer_client_operations.py", line 508, in _analyze_document_initial raise HttpResponseError(response=response) azure.core.exceptions.HttpResponseError: (Timeout) The operation was timeout. Code: Timeout Message: The operation was timeout.

Expected/desired behavior

Need to be able to set a longer timeout for large files in the being_analyze_document call.

OS and Version?

Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?) Windows 10

azd version?

run azd version and copy paste here. azd version 1.1.0 (commit ea9cb12575734ee6a5f99c4d415c1a51d6f32d3e)

Versions

Mention any other details that might be useful

THe below is the code that is timing out: with open(filename, "rb") as f: poller = form_recognizer_client.begin_analyze_document("prebuilt-layout", document = f)

Given that the entire bytestream of the large file has to be sent to the endpoint this looks like a straight HTTP timeout. However, there is no place in the API documentation to change the timeout for the begin_analyze_document call.

I do not believe that re-writing the example to use async IO will work as this is an endpoint timeout.

Thanks! We'll be in touch soon.

Jul 31 '23 14:07 davidwboyd

I also don't see anything in the docs to extend the timeout: https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/formrecognizer/azure-ai-formrecognizer/README.md

You could log an issue in the azure-sdk-for-python repo about this to see if they have any feedback. However, it may just be a limitation of the underlying API. So a workaround would be to preprocess the PDF to split it into smaller documents.

Aug 04 '23 18:08 pamelafox

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this issue will be closed.

Oct 04 '23 01:10 github-actions[bot]

I am planning to build an app with Azure Document Intelligence and while testing the capabilities of this service, I also found this issue when trying to convert a large file. Looks like this is not a priority, perhaps I can split the PDF prior to sending it,,,

Dec 12 '23 02:12 mvfpoa

Is there any update on this? I am getting the following error when trying to analyze a pdf of 5MB:

"azure.core.exceptions.HttpResponseError: (Timeout) The operation was timeout. Code: Timeout Message: The operation was timeout."

I'd rather not have to split the document into smaller chunks beforehand. Any ideas / solutions?

Jan 08 '24 10:01 zainif

I'm encountering the same error with the REST API.

{ "error": { "code": "Timeout", "message": "The operation was timeout." } }

Jan 10 '24 09:01 felixng313

+1.

The only solution seems to be adding more document intelligence services and splitting up the doc into smaller chunks, which isn't a great solution. Would love a timeout or parallelism functionality.

Jan 29 '24 21:01 alecswjo

Hi all, thanks for the feedback. I've created an issue in our Azure SDK repo and we'll investigate ASAP.

Feb 01 '24 16:02 rohit-ganguly

Is anyone on the thread able to share a PDF that resulted in a timeout? If so, please email to pamelafox at microsoft . com

Feb 04 '24 22:02 pamelafox

Is anyone on the thread able to share a PDF that resulted in a timeout? If so, please email to pamelafox at microsoft . com

@pamelafox Please check your inbox as I have sent you a sample file to reproduce this issue. Furthermore, this issue occurs when using the Markdown output format.

Feb 16 '24 09:02 felixng313

azure-search-openai-demo
azure-search-openai-demo copied to clipboard

form recognizer begin_analyze_document timeout on large files

Please provide us with the following information:

This issue is for a: (mark with an `x`)

Minimal steps to reproduce

Any log messages given by the failure

Expected/desired behavior

OS and Version?

azd version?

Versions

Mention any other details that might be useful

azure-search-openai-demo azure-search-openai-demo copied to clipboard

form recognizer begin_analyze_document timeout on large files

Please provide us with the following information:

This issue is for a: (mark with an x)

Minimal steps to reproduce

Any log messages given by the failure

Expected/desired behavior

OS and Version?

azd version?

Versions

Mention any other details that might be useful

azure-search-openai-demo
azure-search-openai-demo copied to clipboard

This issue is for a: (mark with an `x`)