aws-sdk-go
aws-sdk-go copied to clipboard
[Feature request] Refresh credentials from file
This is a feature request to support reading temporary credentials from a file i.e. the shared credentials file, and refresh those credentials when they expire assuming the file has been updated by some external source.
Other credential providers like the assume role provider implements refresh of credentials and the docstring for the SharedCredentialsProvider also suggests to support this (at least how I read it) but unfortunately it only reads the credentials once and implements a "fake" IsExpired() method.
Given that you can store temporary credentials in the shared credentials file already it would make sense if the SDK could automatically reload the file once the credentials expire. This way you could have some external tool that refreshes the credentials in the file and the SDK would not have to be restarted to reload them.
My specific use case is to solve a problem of distributing AWS IAM credentials to Pods in Kubernetes clusters hopefully helping a lot of EKS users in the future! I have written a controller that can do this (README explains how it solves some problems not fixed in other solutions). However it relies on the fact that the AWS SDKs can refresh credentials from a file, which unfortunately isn't the case at the moment and the reason why I'm reaching out.
I realize that this fix needs to be done in every AWS SDK, but Go is what I'm most familiar with so my hope is I can start the discussion here and maybe you can help me get in contact with the right people in case this needs coordination across different AWS SDK implementations.
I have already talked with the EKS team about this, but they did not have bandwidth to focus on this topic at the moment.
I have done my own quick implementation in the Go SDK. Feedback very welcome also if I should approach it differently. E.g. I have not considered any possible backwards compatible issues with this change.
https://github.com/aws/aws-sdk-go/compare/master...mikkeloscar:file-refresh
For reference here is a community driven comparison of solutions to the problem of distributing AWS IAM credentials in Kubernetes clusters.
Hello @mikkeloscar, thank you for reaching out to us and proposing this. We have a very similar concept proposed. I'll mark this as a feature request and bring this up in our next sprint.
@xibz awesome! If there's anything I can do to help out, just let me know!
@xibz Any news on this? I'll be happy to help in any way I can to get this going.
@mikkeloscar thanks again for reaching out to us with this feature request. We decided that while this feature can be useful, bundling it into the SDKs introduces the potential for race conditions, and cross platform compatibility support. Performing atomic file locking and change notification do not have feature parity across platforms, and can introduce unexpected instability.
I think there are a couple alternative ways your IAM Controller for Kubernetes could solve the EC2 Metadata request timeout (<1s) issue you're seeing.
Generic HTTP Credential provider
The SDK includes a Generic HTTP Credential Provider which will attempt to retrieve credentials from an HTTP endpoint on the 169.254.170.2 IP. This credential provider supports expiration just like EC2 Instance roles. Your application(s) can use this credential provider directly via code, or rely on the SDK's default credential chain using the environment variable, AWS_CONTAINER_CREDENTIALS_RELATIVE_URI to retrieve credentials from an HTTP endpoint similar to what your Kubernetes Controller is doing with the EC2 Metadata proxy. The AWS_CONTAINER_CREDENTIALS_RELATIVE_URI environment variable's value is a URI path that the SDK will make an HTTP request to. Even better, this credential provider is supported by the default chain across all of the AWS SDKs.
In the following example the SDK will make an HTTP request to http://169.254.170.2/MyCreds, if no other credentials are found in the credential chain, (i.e no environment, nor file credentials)
AWS_CONTAINER_CREDENTIALS_RELATIVE_URI=/MyCreds ./MyApp
The HTTP response body is a JSON value containing the credentials the SDK should use. See the IAM Roles for Tasks documentation for more details of this JSON value.
{
"AccessKeyId": "ACCESS_KEY_ID",
"Expiration": "EXPIRATION_DATE",
"RoleArn": "TASK_ROLE_ARN",
"SecretAccessKey": "SECRET_ACCESS_KEY",
"Token": "SECURITY_TOKEN_STRING"
}
aws.Config.EC2MetadataDisableTimeoutOverride
Alternatively, setting the aws.Config.EC2MetadataDisableTimeoutOverride configuration value to true within your Go application will prevent the SDK from using a significantly shorter request timeout, in comparison to the <1s timeout used by default when making EC2 Metadata requests.
For this solution I think we can also consider adding support for an environment variable which uses non-shortened EC2 Metadata request timeout by default without modifying the application(s) running in the pod directly.
@jasdel Thanks for getting back to me on this.
While I really appreciate you giving a reason for not wanting to go forward with this feature request, I'm sad that you will not consider it because of the reasons you give.
I don't see what file locking issues there would be. In my mind the implementation should just be a simple read of the file, and if the credentials, at the time of reading, are up to date then fine, otherwise it would just fail to get credentials. It would be up to some external tool to keep the files up to date.
As an anecdote we implemented something similar for our IAM infrastructure in 2016 at Zalando and are currently using this in 85 Kubernetes clusters. Our experience is that this approach is much more stable than anything involving the network.
I have ofc. considered what you propose, and it has also been discussed multiple times in the Kubernetes community. However I still think that the approach is so much more complicated to get right, that I really don't see the reason for not having the simple file read support in the SDKs. If you have something that calls an HTTP endpoint, you need to ensure that this endpoint is available at all times, this mean you need to run an HA setup in order to do updates of you component. Additionally you need to have retries in your clients to account for network issues, something you most likely won't need reading from a local file as you would have bigger problems if this started to fail.
If I have to ask users to set aws.Config.EC2MetadataDisableTimeoutOverride in the applications then I might as well ask them to use a custom SDK with file refresh support, so I don't think this is so helpful. Having an environment variable would be helpful, but I still consider this a hack rather than a robust solution.
I wish you would reconsider this feature request, but if not, there is nothing I can do about it. Either way, thanks for looking at my request.
Thanks for the feedback and update @mikkeloscar.
In the context of Kubernetes our concerns with the refreshing from a file are probably manageable. Given that Kubernetes would own the environment be responsible for writing the file as needed. Our biggest concern with this file being refreshed is two factor. Usage outside of Kubernetes, and that the shared configuration and shared credential files are interchangeable for the most part. This means that there are non-credential related information that would be refreshed. The additional information in the files doesn't make sense to be refreshed during the lifetime of an application.
A http endpoint for retrieving credentials per pod provides the best flexibility for applications of users. The issue with only being able share a single environment variable for AWS_CONTAINER_CREDENTIALS_RELATIVE_URI is problematic. Especially, if pods should have different roles they assume. But if all Pods are able to use the same environment variable using this method of setting up the HTTP endpoint is the best. In addition SDKs will not apply short timeouts for the endpoint since it must be explicitly enabled unlike EC2 Instance roles which is automatic.
I think further discussing EC2MetadataDisableTimeoutOverride as an environment variable makes absolute sense. Ideally this environment variable would be usable across all the SDKs.
@jasdel thanks for providing an update on this!
Our biggest concern with this file being refreshed is two factor. Usage outside of Kubernetes, and that the shared configuration and shared credential files are interchangeable for the most part. This means that there are non-credential related information that would be refreshed. The additional information in the files doesn't make sense to be refreshed during the lifetime of an application.
If this is such a big concern, then let me propose a new credentials provider for this use case. We could call it RefreshableFileCredentials or something like that and it could rely on a file with the same content as what you get from an assume role i.e.:
{
"AccessKeyId": "ACCESS_KEY_ID",
"Expiration": "EXPIRATION_DATE",
"RoleArn": "TASK_ROLE_ARN",
"SecretAccessKey": "SECRET_ACCESS_KEY",
"Token": "SECURITY_TOKEN_STRING"
}
It could even be enabled ONLY when a certain environment variable is set, e.g.: AWS_REFRESHABLE_FILE_CREDENTIALS=/path/to/file. This way it would not affect users outside of Kubernetes (or other environments where this could be useful).
I don't mind writing a proper proposal for how this would work, I just don't want to spend time on it if there's no chance it could be considered.
Btw. today I looked into the code of the AWS SDK for Java and found out that it does support some form of file refreshing. It does so with a fixed refresh interval of 5 min. and reloads the file if it was changed since last check. While this is not the nicest solution, it actually solves the problem. I made a simple example application to verify that it does indeed work. This basically means the problem is solved for all JVM users which is nice.
I don't suggest that it should be done the same way for the Go SDK, but considering that the Java SDK has support, would you be more willing to add support (in some form) to the Go SDK?
A http endpoint for retrieving credentials per pod provides the best flexibility for applications of users.
I don't agree. Users don't care how it's implemented. They care that it just works and they don't have to jump through a million hoops. To make an HTTP endpoint implementation "just work" is much more complex than what I propose. For reference this is the current HTTP endpoint based proposal from the community. It's very complex..
I think further discussing
EC2MetadataDisableTimeoutOverrideas an environment variable makes absolute sense. Ideally this environment variable would be usable across all the SDKs.
I would really prefer to spend effort on fixing the root cause instead of implementing hacks. For me this is equivalent to putting sleeps in your code to work around race conditions. That's why I don't like it because it moves the problem away from the SDKs to other components that becomes harder to implement in a good way because of that. When a change is needed to all SDKs anyway, why not make a change that can solve the problem at the lowest level, and solves the problem for good?
Hello @mikkeloscar, thank you for pointing out some various points on this feature. I've went ahead and started working with EKS to see what solutions may be possible. Once we land on something concrete, I'll update this issue and hopefully have a PR with the implementation.
@xibz That sounds like great news!
Do you have any rough estimation on the time frame for this? It's hurting us a lot not having a robust solution and we are able to help any way you need. We have 100 Kubernetes clusters on AWS we can use for testing this! I can also write a full proposal describing the idea described above if this would be helpful, just let me know!
Thanks for picking this up again!!
Thanks @mikkeloscar we'll be glad to review changes when we get closer to having a design for this feature. We'll be working with the other AWS SDKs in our designs to create a consistent experience and support.
Using a JSON file instead of the ini-like shared config file addresses the majority of our concerns about partial reads. The Go structure with a JSON file most likely would be similar to the endpointcreds.getCredentialsOutput type. The Expiration member would be serialized from a string formatted as RFC3339 time, e.g. (2006-01-02T15:04:05Z).
type RefreshableCredentials struct {
Expiration *time.Time
AccessKeyID string
SecretAccessKey string
Token string
}
In you're example you included a RoleArn member. Could you go into details more about the usage of this value? In general SDKs load this value from the shared config file (~/.aws/config) when loading configuration. Currently most SDKs don't support assuming a role with credentials which are not defined in the shared config/credential file (#1019), but I think also updating the SDK to support assuming a role with additional credentials would be the best way to support the Assume Role use case the RoleArn looks to facilitate.
In you're example you included a RoleArn member. Could you go into details more about the usage of this value?
Sorry, this I copied verbatim from your comment (comes from here: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html). I didn't actually have anything in mind for the RoleArn. The struct you suggest covers everything that would be needed and is actually what I had in mind.
I did also not consider the Assume Role use case. I would expect whatever provides the credentials would already have performed the assuming of a role if needed. This I would simply consider out of scope.
Thanks for the clarification. I missed that when making my post. Agreed, assume role case would be a nice to have, but would probably not be a blocker.
Following this; I'm a bit confused on the technical impediment.
Calls to rename(2) are atomic; so external tools refreshing AWS credential files need only to write them to temporary file and rename them into place. The Go SDK needs only to call os.Stat periodically (possibly 5 minutes like the Java SDK) to see if the ModTime has changed and return true in IsExpired if the modification time has changed.
The Go SDK needs only to call os.Stat periodically (possibly 5 minutes like the Java SDK) to see if the ModTime has changed and return true in IsExpired if the modification time has changed.
You don't need to do it periodically, you already read credentials which is valid for a certain period the first time the file was read, you only need to refresh those once they expire so you can simply read the file every time the credentials expire, and it could be changed a million times in between without you having to care about it :)
Hey, is there any headway on this feature?
It appears that there's logic in the AfterRetryHandler that marks a credential as expired whenever we receive an expired exception from AWS: https://github.com/aws/aws-sdk-go/blob/9d1f49ba63bdac44a5b9f4d736e79d792e389e8a/aws/corehandlers/handlers.go#L209-L214
The current code uses a StaticProvider, which means that such refresh mechanisms would not work with a shared credential file. I have submitted a PR addressing that issue: https://github.com/aws/aws-sdk-go/pull/5073. It should also indirectly addresses this feature request. With that change, one could just periodically update the shared credentials file. As long as the file gets updated before it expires, the SDK should automatically re-read the credential file when it receives an expired token exception.
is there any update on this feature request? We have similar use case for using open telemetry and reading temporary credentials from shared credentials file.
We're not going to implement this, this type of credential setup isn't present anywhere in the AWS ecosystem to my knowledge (creds that will need refreshed are always vended from an explicitly dynamic source e.g. IMDS, eks/ecs endpointcreds, etc.) and none of the SDKs are implemented to refresh file-based credentials accordingly.
Our recommendation as of right now is to either implement a local credentials service to be retrieved with endpointcreds / IMDS or roll your own logic to load credentials from the ~/.aws/credentials file.
⚠️COMMENT VISIBILITY WARNING⚠️
Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.