media-insights-on-aws icon indicating copy to clipboard operation
media-insights-on-aws copied to clipboard

Support organizations with an AI opt-out policy

Open ianwow opened this issue 2 years ago • 3 comments

Issue #, if available:

#699

Description of changes:

Lets just always save transcribe results to the root of the dataplane bucket instead of using a bucket hosted by the transcribe service.

Fixes this error:

An error occurred (BadRequestException) when calling the StartTranscriptionJob operation: You must specify a value for outputBucketName because your account, XXXXXXXXX, has opted out of using your content for quality improvements. Make sure you provide a value for outputBucketName and try your request again.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

ianwow avatar Mar 18 '22 21:03 ianwow

I'm not sure if the root of the Dataplane bucket is the best place to tell Transcribe to save job files.

ianwow avatar Mar 21 '22 20:03 ianwow

If you need this fix before it's has been merged into an MIE release, then run the following commands:

export MIE_STACK_NAME=$(aws cloudformation describe-stacks --region us-east-1  --query 'Stacks[?contains(StackName, `MieStack`)].StackName | [-1]' | tr -d '"' | awk -F '-MieStack-.*-' '{print $1}')

export MIE_OPERATOR_STACK_NAME=$(aws cloudformation list-stacks --region us-east-1 --query 'StackSummaries[?starts_with(StackName,`'$MIE_STACK_NAME-OperatorLibrary'`) && StackStatus==`CREATE_COMPLETE`].StackName' --output text)

FUNCTION_NAME=$(aws --region $AWS_DEFAULT_REGION cloudformation list-stack-resources --stack-name $MIE_OPERATOR_STACK_NAME  --no-paginate --output json --query 'StackResourceSummaries[?LogicalResourceId==`CheckTranscribeFunction`].PhysicalResourceId' --output text)
wget https://raw.githubusercontent.com/aws-solutions/media-insights-on-aws/9ad1711de00ae278dbf71b0e65d3dc52fa987852/source/operators/transcribe/get_transcribe.py
zip get_transcribe.zip get_transcribe.py
aws lambda update-function-code --function-name $FUNCTION_NAME --zip-file fileb://get_transcribe.zip --region us-east-1

FUNCTION_NAME=$(aws --region $AWS_DEFAULT_REGION cloudformation list-stack-resources --stack-name $MIE_OPERATOR_STACK_NAME  --no-paginate --output json --query 'StackResourceSummaries[?LogicalResourceId==`StartTranscribeFunction`].PhysicalResourceId' --output text)
wget https://raw.githubusercontent.com/aws-solutions/media-insights-on-aws/9ad1711de00ae278dbf71b0e65d3dc52fa987852/source/operators/transcribe/start_transcribe.py
zip start_transcribe.zip start_transcribe.py
aws lambda update-function-code --function-name $FUNCTION_NAME --zip-file fileb://start_transcribe.zip --region us-east-1

ianwow avatar Apr 25 '22 23:04 ianwow

I verified it works for me and fix the outputBucketName issue. I'm deploy the it on aws solution https://aws.amazon.com/solutions/implementations/content-localization-on-aws/

For me I'm running on us-east-1 region, so don't forget run command below to set your region first.

export AWS_DEFAULT_REGION=us-east-1

andychen23 avatar May 06 '22 18:05 andychen23