document-understanding-solution icon indicating copy to clipboard operation
document-understanding-solution copied to clipboard

Bug described in AWS DUS (Case 9497917291)

Open samirpadomega opened this issue 3 years ago • 1 comments

Describe the bug Uploaded 3 pdf files (reasonably large). 2 files resulted in status "Failed", 1 file has Status "Ready", but when clicked upon, no data appears.

To Reproduce Please upload attached files into a DUS (kendra enabled)

Expected behavior

Please complete the following information about the solution:

Screenshots If applicable, add screenshots to help explain your problem (please DO NOT include sensitive information).

Additional context Add any other context about the problem here.

samirpadomega avatar Jan 25 '22 22:01 samirpadomega

Hi @samirpadomega , I believe I talked to Aimad about this case and was able to solve the issues you are seeing:

  1. For

botocore.errorfactory.TextSizeLimitExceededException: An error occurred (TextSizeLimitExceededException) when calling the BatchDetectEntities operation: Input text size exceeds limit. Max length of request text allowed is 5000 bytes while in this request the text size is 5002 bytes

you need to make the following change in source/lambda/helper/python/comprehendHelper.py line 81

projectedSize = len(
                            rawPages[pageResultIndex].encode('utf-8')) + len(block['Text'].encode('utf-8')) + 2
  1. For ReadTimeOutError when calling pdfgeneratorlambda, you need to make the following changes in jobresultprocessorlambda in generatePdf():
import botocore (add this at the top of the file)

Line 39-40:

config = botocore.config.Config(read_timeout=900, connect_timeout=900)
client = boto3.client('lambda', config=config)

These 2 changes should solve the issues you are seeing for the documents attached.

ShivaniMehendarge avatar Feb 09 '22 19:02 ShivaniMehendarge

@samirpadomega in version v1.0.10, we have also changed the default DPI in the pdfgenerator to 100 DPI from 300 DPI. This can also be controlled with an environment variable IMAGE_DPI set in the pdfgenerator lambda. I am closing this issue for now, but if you still face something that is not working, please feel free to reopen the ticket or create a new one.

knihit avatar Mar 07 '23 18:03 knihit