aws-sdk-js icon indicating copy to clipboard operation
aws-sdk-js copied to clipboard

UnsupportedDocumentException: Request has unsupported document format

Open ab958 opened this issue 3 years ago • 3 comments

i am using a aws sdk while using analyzeDocument i uplaod a png file using form-data it works fine locally but when i test the deployed fuctiions it give this error UnsupportedDocumentException: Request has unsupported document format

ab958 avatar Feb 17 '22 13:02 ab958

Hi @ab958 thanks for reaching out. Can you follow this template and provide us more information on the case:

Confirm by changing [ ] to [x] below to ensure that it's a bug:
- [ ] I've gone through [Developer Guide](https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/welcome.html) and [API reference](https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/)
- [ ] I've checked [AWS Forums](https://forums.aws.amazon.com) and [StackOverflow](https://stackoverflow.com/questions/tagged/aws-sdk-js) for answers
- [ ] I've searched for [previous similar issues](https://github.com/aws/aws-sdk-js/issues) and didn't find any solution
- [ ] This is an issue with version 2.x of the SDK

**Describe the bug**
A clear and concise description of what the bug is.

**Is the issue in the browser/Node.js?**
Browser/Node.js

**If on Node.js, are you running this on AWS Lambda?**

**Details of the browser/Node.js version**
Paste output of `npx envinfo --browsers` or `node -v`

**SDK version number**
Example: v2.466.0
* For browsers, the SDK version number is in the script tag <pre>src=".../aws-sdk-<b>2.466.0</b>.min.js"</pre>
* For Node.js, get SDK version by
  * running command `npm list aws-sdk` from your root directory
  * printing the output of `console.log(AWS.VERSION)` in your code where `AWS = require("aws-sdk");`
  * if running on Lambda and using SDK provided by Lambda runtime, you can find the SDK versions [here](https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtimes.html) 

**To Reproduce (observed behavior)**
Steps to reproduce the behavior (please share code or minimal repo)

**Expected behavior**
A clear and concise description of what you expected to happen.

**Screenshots**
If applicable, add screenshots to help explain your problem.

**Additional context**
Add any other context about the problem here.

vudh1 avatar Feb 19 '22 01:02 vudh1

Confirm by changing [ ] to [x] below to ensure that it's a bug:

Describe the bug i am working on Textract analyzeDocument method to detect my docs. I send images via form-data from the postman and send buffer data of image to analyzeDocument method .

Is the issue in the browser/Node.js? Browser/Node.js i donot know

If on Node.js, are you running this on AWS Lambda? yes

Details of the browser/Node.js version Paste output of npx envinfo --browsers or node -v v14.17.5

SDK version number "aws-sdk": "^2.1075.0", im using this version

To Reproduce (observed behavior) Steps to reproduce the behavior (please share code or minimal repo)

Expected behavior i am serverless framework when i try my lambda function locally it detects the text from image and works fine but when i deployed the lambda functions to aws it doesn't detect the text. buffer data of same image is different on locally and on cloud watch

ab958 avatar Feb 21 '22 11:02 ab958

For those of you finding this in the future I had the same problem. Turns out that PDFs with more than 1 page need to be run Asynchronously. Meaning that the usual restful AWS call wont work. You need to do something like the following:

 var response = await textractClient.send(new StartDocumentAnalysisCommand({DocumentLocation:{S3Object:{Bucket:bucket, Name:videoName}}, 
          NotificationChannel:{RoleArn: roleArn, SNSTopicArn: snsTopicArn}}))
          

The docs I am basing this on are from here:

https://docs.aws.amazon.com/textract/latest/dg/async-analyzing-with-sqs.html

Yeah burnt a bunch of my time figuring this out. Hopefully it helps.

schematical avatar Jul 18 '22 22:07 schematical

For those of you finding this in the future I had the same problem. Turns out that PDFs with more than 1 page need to be run Asynchronously. Meaning that the usual restful AWS call wont work. You need to do something like the following:

 var response = await textractClient.send(new StartDocumentAnalysisCommand({DocumentLocation:{S3Object:{Bucket:bucket, Name:videoName}}, 
          NotificationChannel:{RoleArn: roleArn, SNSTopicArn: snsTopicArn}}))
          

The docs I am basing this on are from here:

https://docs.aws.amazon.com/textract/latest/dg/async-analyzing-with-sqs.html

Yeah burnt a bunch of my time figuring this out. Hopefully it helps.

I can also confirm this multi-page pdf issue. Using the aws java sdk version 1.12.267. To solve it, use async methods as @schematical said.

@Autowired
private AmazonTextract textract;
...    
// send analysis request
final StartDocumentAnalysisRequest requestStart = new StartDocumentAnalysisRequest()
    .withDocumentLocation(new DocumentLocation().withS3Object(<s3object>))
    ...
final String jobId = textract.startDocumentAnalysis(requestStart).getJobId();

// get response after a while... (for me 3 or 4 seconds)
final GetDocumentAnalysisResult responseGet = textract.getDocumentAnalysis(new GetDocumentAnalysisRequest().withJobId(jobId));

omerhakanbilici avatar Sep 23 '22 12:09 omerhakanbilici

Greetings! We’re closing this issue because it has been open a long time and hasn’t been updated in a while and may not be getting the attention it deserves. We encourage you to check if this is still an issue in the latest release and if you find that this is still a problem, please feel free to comment or open a new issue.

github-actions[bot] avatar Sep 24 '23 00:09 github-actions[bot]