media-insights-on-aws
media-insights-on-aws copied to clipboard
Support organizations with an AI opt-out policy
Issue #, if available:
#699
Description of changes:
Lets just always save transcribe results to the root of the dataplane bucket instead of using a bucket hosted by the transcribe service.
Fixes this error:
An error occurred (BadRequestException) when calling the StartTranscriptionJob operation: You must specify a value for outputBucketName because your account, XXXXXXXXX, has opted out of using your content for quality improvements. Make sure you provide a value for outputBucketName and try your request again.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
I'm not sure if the root of the Dataplane bucket is the best place to tell Transcribe to save job files.
If you need this fix before it's has been merged into an MIE release, then run the following commands:
export MIE_STACK_NAME=$(aws cloudformation describe-stacks --region us-east-1 --query 'Stacks[?contains(StackName, `MieStack`)].StackName | [-1]' | tr -d '"' | awk -F '-MieStack-.*-' '{print $1}')
export MIE_OPERATOR_STACK_NAME=$(aws cloudformation list-stacks --region us-east-1 --query 'StackSummaries[?starts_with(StackName,`'$MIE_STACK_NAME-OperatorLibrary'`) && StackStatus==`CREATE_COMPLETE`].StackName' --output text)
FUNCTION_NAME=$(aws --region $AWS_DEFAULT_REGION cloudformation list-stack-resources --stack-name $MIE_OPERATOR_STACK_NAME --no-paginate --output json --query 'StackResourceSummaries[?LogicalResourceId==`CheckTranscribeFunction`].PhysicalResourceId' --output text)
wget https://raw.githubusercontent.com/aws-solutions/media-insights-on-aws/9ad1711de00ae278dbf71b0e65d3dc52fa987852/source/operators/transcribe/get_transcribe.py
zip get_transcribe.zip get_transcribe.py
aws lambda update-function-code --function-name $FUNCTION_NAME --zip-file fileb://get_transcribe.zip --region us-east-1
FUNCTION_NAME=$(aws --region $AWS_DEFAULT_REGION cloudformation list-stack-resources --stack-name $MIE_OPERATOR_STACK_NAME --no-paginate --output json --query 'StackResourceSummaries[?LogicalResourceId==`StartTranscribeFunction`].PhysicalResourceId' --output text)
wget https://raw.githubusercontent.com/aws-solutions/media-insights-on-aws/9ad1711de00ae278dbf71b0e65d3dc52fa987852/source/operators/transcribe/start_transcribe.py
zip start_transcribe.zip start_transcribe.py
aws lambda update-function-code --function-name $FUNCTION_NAME --zip-file fileb://start_transcribe.zip --region us-east-1
I verified it works for me and fix the outputBucketName issue. I'm deploy the it on aws solution https://aws.amazon.com/solutions/implementations/content-localization-on-aws/
For me I'm running on us-east-1 region, so don't forget run command below to set your region first.
export AWS_DEFAULT_REGION=us-east-1