digdag icon indicating copy to clipboard operation
digdag copied to clipboard

iam_role for the redshift_unload operator

Open davehowell opened this issue 5 years ago • 0 comments

I am not able to use the Amazon Web Services redshift_unload operator. I do not use a permanent aws_access_key and aws_secret_key, but get temporary ones and a session token.

I can successfully use the redshift operator, so I know that the UnloadConfig is working fine to discover my credentials.

The redshift_unload operator, however, throws an error that the secrets are required. OK so I tried to set both the secrets: aws.redshift_unload.access_key_id and aws.redshift_unload.secret_access_key with my temporary credentials.

If I run the dag with temp_credentials: true then I get the error

com.amazonaws.services.securitytoken.model.AWSSecurityTokenServiceException: The security token included in the request is invalid.

If I run the dag with temp_credentials:false then I get a different error

com.amazonaws.services.s3.model.AmazonS3Exception: The AWS Access Key Id you provided does not exist in our records.

As I understand it, the expectation is that I should have permanent credentials that can then be used directly, or used to generate new temporary credentials (or federated which does not apply to my case).

I can see that the UNLOAD statement is different to general queries because it requires some kind of auth parameters, but AWS in their docs recommend using IAM_ROLE , "For increased security and flexibility, we recommend using IAM role-based access control. "

The redshift_unload operator is hard-coded to use a CREDENTIALS clause in the UnloadStatement

I propose that there should be an iam_role option, and if that is set then the Unload Statement should be built with an IAM_ROLE clause instead of a CREDENTIALS clause.

This call to create createBaseCredential is also an issue, it's being used to fetch the credentials from secrets whether or not you want them from there, causing my original error. There is a problem with this because the credentials are being used for 2 purposes in this operator, 1 is for the JDBC connection to Redshift, and 2 is for building the CREDENTIALS clause and you may want them to be different. It also implicitly asserts that these secrets are compulsory which they shouldn't be, given the numerous ways the AWS SDK's can cascade through discovering credentials.

davehowell avatar Oct 25 '19 14:10 davehowell