amazon-a2i-sample-task-uis
amazon-a2i-sample-task-uis copied to clipboard
payload for the liquid templates
Hello,
I have been trying to use some of these templates for my A2I integrations. It is great to have them but for this to be useful, we would need to know the expected payload for A2I that each template expects. Can someone point me to the structure of the expected payload?
Thank you!
I realized that the payload is attached to the A2I output. So if you can successfully run an example of an A2I trigger, you can pull the payload from the output under the key "inputContent". Here is the payload for the textract Key-Value template. textract-keyvalue-sample.liquid.payload.json
Thanks was looking for this today- The documentation misrepresents the input data: https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-crowd-textract-detection.html
Thanks was looking for this today- The documentation misrepresents the input data: https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-crowd-textract-detection.html
@grantrosse it is good to see it is working for someone :)
@rdali Thanks you so much for your help so far.
I am getting an InternalServerException. Please see my code along with payload below: -
`
import os import json import time import uuid from urllib.parse import unquote_plus import boto3
def lambda_handler(event, context): textract = boto3.client("textract") a2i = boto3.client("sagemaker-a2i-runtime") FLOW_ARN = os.environ["FLOW_ARN"] if event: file_obj = event["Records"][0] bucketname = str(file_obj["s3"]["bucket"]["name"]) filename = unquote_plus(str(file_obj["s3"]["object"]["key"]))
# Start document analysis for the whole document
response = textract.start_document_analysis(
DocumentLocation={
"S3Object": {
"Bucket": bucketname,
"Name": filename,
}
},
FeatureTypes=["FORMS"], # Specify the feature types to analyze
ClientRequestToken=str(uuid.uuid4()), # Generate a unique client request token
)
# Retrieve the job ID from the response
job_id = response["JobId"]
# Poll for the completion of the job
while True:
job_status = textract.get_document_analysis(JobId=job_id)['JobStatus']
if job_status in ['SUCCEEDED', 'FAILED']:
break
time.sleep(5) # Wait for 5 seconds before checking again
# Get the results of the analysis
response = textract.get_document_analysis(JobId=job_id)
# Process the results
print(json.dumps(response))
# Extracting the Blocks array from the response
blocks = response.get("Blocks", [])
print(json.dumps(blocks))
document_metadata = response.get("DocumentMetadata", {})
print(json.dumps(document_metadata))
#hln = uuid.uuid4().hex
inputContent = {
"aiServiceRequest":
{
"document":
{
"s3Object":
{
"bucket": bucketname,
"name": filename
}
},
"featureTypes":
[
"TABLES",
"FORMS"
],
"humanLoopConfig":
{
"dataAttributes":
{
"contentClassifiers":
[
"FreeOfAdultContent"
]
},
"flowDefinitionArn": FLOW_ARN,
"humanLoopName": "TheTest"
}
},
"aiServiceResponse":
{
"blocks": blocks,
"documentMetadata": document_metadata
},
"humanTaskActivationConditionResults":
{
"Conditions": [
{
"And": [
{
"ConditionType": "ImportantFormKeyConfidenceCheck",
"ConditionParameters": {
"ImportantFormKey": "*",
"KeyValueBlockConfidenceLessThan": 99,
"WordBlockConfidenceLessThan": 99
}
},
{
"ConditionType": "ImportantFormKeyConfidenceCheck",
"ConditionParameters": {
"ImportantFormKey": "*",
"KeyValueBlockConfidenceGreaterThan": 0,
"WordBlockConfidenceGreaterThan": 0
}
}
]
}
]
},
"selectedAiServiceResponse":
{
"blocks": blocks
}
}
a2i.start_human_loop(
HumanLoopName="TheTest",
FlowDefinitionArn=FLOW_ARN,
HumanLoopInput={
"InputContent": json.dumps(inputContent)
}
)
return {
"statusCode": 200,
"body": json.dumps("Document processed successfully!"),
}
return {"statusCode": 500, "body": json.dumps("Issue processing file!")}
`
Below is the error I am getting: -
[ERROR] InternalServerException: An error occurred (InternalServerException) when calling the StartHumanLoop operation (reached max retries: 4): Internal Server Error Traceback (most recent call last): File "/var/task/lambda_function.py", line 125, in lambda_handler a2i.start_human_loop( File "/var/lang/lib/python3.12/site-packages/botocore/client.py", line 553, in _api_call return self._make_api_call(operation_name, kwargs) File "/var/lang/lib/python3.12/site-packages/botocore/client.py", line 1009, in _make_api_call raise error_class(parsed_response, operation_name)
Not sure what I am doing wrong. Any help would be appreciated.
^one thing I know for sure is that your blocks won't work without some adjustment, see this stackoverflow question for an example: https://stackoverflow.com/questions/64302986/how-to-highlight-custom-extractions-using-a2is-crowd-textract-analyze-document
So in other words you need to adjust the casing on your KEY_VALUE_SET blocks as well as trim everything but the text and id from the WORD blocks (compare rdali's example to the blocks you receive back from textract and you will see what I mean)
Okay. I will check, but just to confirm, the template I am using is the default template and not custom template. I hope this would not make any difference in the payload.
Thank you @grantrosse my code is working now after changing the JSON keys from title case to camel case. But my Human Loop status is now failed. See the screenshot below: -
When I check the error. It shows: -
So the error is: -
ValidationError Task failed to render: [ InvalidParameters: '"grant_read_access" input is not a valid S3 URI: " ".' ].
I am using custom template which is exactly like default kye value pair template.
Thanks Ritesh
If you look at the code of the default template here, you can see that the S3 uri is being constructed through liquid as follows:
{% capture s3_uri %}s3://{{ task.input.aiServiceRequest.document.s3Object.bucket }}/{{ task.input.aiServiceRequest.document.s3Object.name }}{% endcapture %}
make sure that your bucket and s3Object.name do not have extra slashes or characters