external-dns
external-dns copied to clipboard
feat(aws): add tls ca provision
Description
Adds TLS CA, TLS Client Cert and key to AWS Provider.
Uses custom http transport to feed to aws config default options if tls config is provided.
Fixes #5026
Checklist
- [x] Unit tests updated
- [x] End user documentation updated
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign szuecs for approval. For more information see the Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
Hi @hjoshi123. Thanks for your PR.
I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
/retitle feat(aws): add tls ca provision /ok-to-test @hjoshi123 Would you please update documentation explaining users that it exists and how to use it ?
@ivankatliarchuk @mloiseleur not sure what is up with the lint job. In my local make go-lint is working without any errors. This is the error in CI. The tests are passing now
jsonschema: "linters-settings" does not validate with "/properties/linters-settings/additionalProperties": additional properties 'maligned' not allowed
jsonschema: "run" does not validate with "/properties/run/additionalProperties": additional properties 'exclude-files' not allowed
Failed executing command with error: the configuration contains invalid elements
, Error: Command failed: /home/runner/golangci-lint-1.63.4-linux-amd64/golangci-lint config verify
jsonschema: "linters-settings" does not validate with "/properties/linters-settings/additionalProperties": additional properties 'maligned' not allowed
jsonschema: "run" does not validate with "/properties/run/additionalProperties": additional properties 'exclude-files' not allowed
Failed executing command with error: the configuration contains invalid elements
at genericNodeError (node:internal/errors:984:15)
at wrappedFn (node:internal/errors:5[38](https://github.com/kubernetes-sigs/external-dns/actions/runs/13348358593/job/37281685858?pr=5097#step:7:41):14)
at ChildProcess.exithandler (node:child_process:422:12)
at ChildProcess.emit (node:events:519:28)
at maybeClose (node:internal/child_process:1105:16)
at ChildProcess._handle.onexit (node:internal/child_process:305:5)
Error: Command failed: /home/runner/golangci-lint-1.63.4-linux-amd64/golangci-lint config verify
jsonschema: "linters-settings" does not validate with "/properties/linters-settings/additionalProperties": additional properties 'maligned' not allowed
jsonschema: "run" does not validate with "/properties/run/additionalProperties": additional properties 'exclude-files' not allowed
Failed executing command with error: the configuration contains invalid elements
Apologies there going to be a lot of text, but I would suggest to extract roots, _ = x509.SystemCertPool() logic into separate pull request, as I could not approve it, and this require approval/review from project owners.
So what most likely should go in documentation in some form as a recomendation
- The recommended approach is not to package certicates with the docker/code/system unless it's a close-source software.
- In scope of work, the open source service
external-dnsis going to make a call to a hyperscaler via a proxy, andexternal-dnsknow nothing about certicate that is required for proxy, so in theory it should not be available on system level. I made this up, not sure how to explain correctly. - To alligh with Industry standards is to issue a short lived intermediate certificates and mount them in the same mount or split them by TTL lifetime
- ^ one could re-package application, if there is a requirement to package application and certificate instead of mounting them at runtime
- It very nice to support multiple mount point location for certificates, for cases with multiple issuers, or different TTL, but let's not overcomplicate this, so we could ignore that
I would reference here The Adobe Common Controls Framework (CCF), here is a link https://www.adobe.com/trust/compliance/adobe-ccf.html, there are around 4000+ controls, but we only interested in few very relevant
The Go code involving x509.CertPool relates to several key security domains within CCF.
Relevant CCF Controls for x509.CertPool Security
1. Cryptographic Controls & Certificate Management
Related CCF Domains:
- CCF-CM-1: Use of Strong Cryptographic Standards
- CCF-CM-2: Certificate and Key Management
- CCF-CM-3: Trusted Certificate Authorities
How It Relates to the Code:
- Using
x509.SystemCertPool()ensures system-trusted certificates are used, aligning with CCF’s requirement to rely on approved CAs. - Using an empty
x509.CertPooland adding/mounting CAs aligns with stricter cryptographic controls by allowing explicit trust settings.
Recommendation:
- Ensure certificates are sourced from a trusted and validated CA.
- Regularly update and rotate CA certificates to comply with lifecycle management policies.
2. System & Network Security
Related CCF Domains:
- CCF-NS-3: Secure TLS Configurations
- CCF-NS-5: Prevent Unauthorized Certificate Authorities
How It Relates to Your Code:
- If the system CA store is compromised,
x509.SystemCertPool()could trust unverified CAs. - A custom
CertPoolprevents unauthorized certificates, improving security.
Recommendation:
- Use TLS best practices (e.g., enforcing strong ciphers and TLS 1.2+).
- Monitor system CA stores for unauthorized modifications.
- Prefer explicit trust models (Option 2) in high-security environments.
3. Error Handling & Secure Coding Practices
Related CCF Domains:
- CCF-SD-1: Secure Software Development Lifecycle (SDLC)
- CCF-SD-3: Proper Error Handling
How It Relates to Code:
- Ignoring errors (
_ = x509.SystemCertPool()) is a bad security practice. - If
x509.SystemCertPool()fails (e.g., Windows lacks support), a nilrootscould cause panics or security failures.
Recommendation:
- Always check for errors when calling
x509.SystemCertPool(). - Implement logging and alerting for failures.
| Option | Security Implication | CCF Compliance Alignment |
|---|---|---|
Option 1 (SystemCertPool()) |
Trusts system CAs (convenient but can be risky) | ✅ Meets CCF-CM-3 (Trusted CAs) but ❌ may not meet CCF-NS-5 (Unauthorized CAs) |
Option 2 (Empty CertPool) |
Only explicitly added CAs are trusted | ✅ Stronger security, aligns with CCF-NS-5 and CCF-CM-2 |
@ivankatliarchuk sure I can change it back to start form empty cert pool no issues.. We can make that as part of a separate discussion
@ivankatliarchuk do you have any idea about the ci lint job error?
I'll try to find a solution for linter. We recently added few things, looks like a bug was introduced
@ivankatliarchuk I can debug and try to fix it in a different PR if its fine by you?
A fix for this issue appears to be available in another pull request (https://github.com/kubernetes-sigs/external-dns/pull/5085/files). Hopefully, it will be reviewed and approved soon.
A fix for this issue appears to be available in another pull request (https://github.com/kubernetes-sigs/external-dns/pull/5085/files). Hopefully, it will be reviewed and approved soon.
Ah I see.. Let's wait for it then.. otherwise does the PR look good to you?
A fix for this issue appears to be available in another pull request (https://github.com/kubernetes-sigs/external-dns/pull/5085/files). Hopefully, it will be reviewed and approved soon.
Ah I see.. Let's wait for it then.. otherwise does the PR look good to you?
Few things to consider, and something like a test on real-cluster/local-cluster will speed up review.
As @mloiseleur pointed out, you need to consider what to add to a documentation, maybe as well provide an example setup with a proxy of your choice in the docs on top of the description/recomendations.
A fix for this issue appears to be available in another pull request (https://github.com/kubernetes-sigs/external-dns/pull/5085/files). Hopefully, it will be reviewed and approved soon.
Ah I see.. Let's wait for it then.. otherwise does the PR look good to you?
Few things to consider, and something like a test on real-cluster/local-cluster will speed up review.
As @mloiseleur pointed out, you need to consider what to add to a documentation, maybe as well provide an example setup with a proxy of your choice in the docs on top of the description/recomendations.
Hmm.. I am not sure how to test it in real world since I dont have a proxy working or any real world ca-certs. I can work on the documentation though. My unit test actually tries to mimic the real world scenario that the issue was talking about.. it creates a root CA first and then uses the rootCA to issue child certs and child keys which is then fed to aws config through http transport.. but if you have better suggestions might help. Meanwhile I can check on the documentation
I'll throw some ideas
- [ ] There is an issue, so worth to ask a requestor to share his setup, he may be able to point out or explain things around how to configure proxy and certificates.
- [ ] How AWS credentials provided to external-dns when it's expected to run behind a proxy
- [ ] Instrument a local proxy with certificates in minikube or some other cluster of you choice
- [ ] Review this code there is an example how to instrument aws-sdk-go-v2 behind a proxy. Not as useful code, but go with proxy example
- [ ] The unit tests should have a proxy setup and aws stubs something like
test -> HTTP proxy -> calls to fake AWS provider
@ivankatliarchuk I didnt think it needed to be that detailed since the idea is similar to the rfc2136 provider where similar kind of tests are being done for TLS. @mloiseleur what do you think? Definitely worth asking the requestor if he can share his setup
From the credentials perspective I dont think anything changes since we are just providing TLS certs for the traffic to flow.. the environmental values or the way AWS picks up creds should be same
Np, I think I shared enough information to consider. Could be that I did not spend enough time to understand the original issue, and this is my understanding of client with forward proxy
I found this code for older go-sdk version, that nicely describes go client behind a proxy case https://gist.github.com/jakexks/2f876697dfca1fe15b92f7bb6032780d
@ivankatliarchuk that makes sense what you sent to check if the proxy is working and if the request coming from the proxy with the TLS CA Cert is trusted by AWS to generate the config. Imo getting the raw credentials through profile or environmental variables wouldnt change since that is dictated by the actual client the only thing that changes is how the request is made.
In summary, I agree we need a better way to test the use case of proxy, but I feel from a unit test perspective that is beyond the scope of unit test. @mloiseleur What do you suggest?
It would be nice if @cavdhut could give us more insights on how he uses the setup and if the changes made accommodate the use case he spoke about.
@hjoshi123 On RFC2136 provider, there is a test ensuring that all parameters are effectively taken into account, see https://github.com/kubernetes-sigs/external-dns/blob/8a69069c36b156d3e16795d31a3fba6c41204050/provider/rfc2136/rfc2136_test.go#L425-L495
Wdyt of doing the same for aws provider ?
Hmm.. I am not sure how to test it in real world since I dont have a proxy working or any real world ca-certs.
This project is widely used. We cannot merge something that has not been properly tested. For a simple manual test, you can setup locally a small proxy with TLS auth.
@mloiseleur I can try the stub provider as its shown in the RFC2136 and push it.. essentially in the unit test, I am doing the same thing, I am creating a CA using that to sign a cert and then using that as TLSConfig... given that part is already there I can try the stub to see if the provider works as required.. will clean up some of the code as you mentioned.. does that work?
sure, it sounds good.
Without reproducable setup on local machine with mininkute or kind, this going to be on hold. As need to be proper smoke tested. Access to AWS via proxy and rfc2136 have not much in common.
This is the high-lvel architecture diagram for an infrastructure, the code is not following it
PR needs rebase.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closed
You can:
- Mark this PR as fresh with
/remove-lifecycle stale - Close this PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closed
You can:
- Mark this PR as fresh with
/remove-lifecycle rotten - Close this PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closed
You can:
- Reopen this PR with
/reopen - Mark this PR as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
@k8s-triage-robot: Closed this PR.
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closedYou can:
- Reopen this PR with
/reopen- Mark this PR as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.