azure-sdk-for-python
azure-sdk-for-python copied to clipboard
[Text analytics] Warning about document length inconsistent with API service documentation
- Package Name: azure-ai-textanalytics
- Package Version: 5.2.1
- Operating System:
- Python Version: 3.10
Describe the bug
The text analytics endpoint indicates that the max size per document is 30,720 characters however when submitting a document considerably smaller, a warning is received despite smaller character count. See below traceback printing the value of the document and warning for the same AnalyzeHealthcareEntitiesResult object.
See data limits documentation here: https://learn.microsoft.com/en-us/azure/cognitive-services/language-service/concepts/data-limits#maximum-characters-per-document
To Reproduce Steps to reproduce the behavior:
- Submit a document above 8000 characters to text analytics API
Expected behavior
No warning is received.

Screenshots If applicable, add screenshots to help explain your problem.
Additional context
This happens both for the default text analytics model and when using model_version 2022-08-15-preview
learn.microsoft.comlearn.microsoft.com
Data limits for Language service features - Azure Cognitive Services
Data and service limitations for Azure Cognitive Service for Language features.

Label prediction was below confidence level 0.6 for Model:ServiceLabels: 'Cognitive - Text Analytics:0.54613066,Docs:0.21392056,Cognitive Services:0.044528954'
Label prediction was below confidence level 0.6 for Model:ServiceLabels: 'Cognitive - Text Analytics:0.54613066,Docs:0.21392056,Cognitive Services:0.044528954'
Hey @justinqquall, my understanding is that the 30k char limit is the max you can send in a request, any more than that and the request will fail. From your screenshot, the request succeeds, but there is a warning. Warnings are usually returned to indicate that the quality of the model prediction may be affected due to some reason. @peytonfraser from the Language service team to confirm.
Adding @aurghob to confirm.
Hi, Apologies for the delayed reply. We have this task in our backlog and will prioritize accordingly.