consul-k8s icon indicating copy to clipboard operation
consul-k8s copied to clipboard

Create ConfigMap containing accessor ids of all tokens provisioned by `server-acl-init`

Open onematchfox opened this issue 4 years ago • 6 comments
trafficstars

Changes proposed in this PR:

This PR proposes the addition of a new ConfigMap to be added containing the accessor ids of any tokens that are created by the server-acl-init job.

On a couple occasions now I have found the need to augment the policy of the tokens provisioned by this job but in order to do so one must know the accessor ID of the token that you wish to attach additional roles/policy too. This is due to the fact that the only unique identifier for a token is the accessor id. Armed with only the description, or even the secret itself, one cannot deterministically ascertain what the actual identifier is for a token using the tokens endpoints on the API.

My initial requirement was just to grant read access to agent_prefix for the anonymous token to allow the agent metrics endpoints to be hit without having to expose any real secrets to the daemonset used to monitor these agents. My initial attempt to actually update the anonymous policy failed as I soon found out that the server-acl-init job will overwrite any policy changes (which incidentally I believe is the underlying reason for https://github.com/hashicorp/consul-helm/issues/894). However, given the hard-coded accessor id for the anonymous token it was fairly easy to work around this as one can attach additional policy to a token if you know the id.

I subsequently started looking at Terminating Gateways and found that once again I needed a way to augment the default policy created by the server-acl-init job. From #264

For terminating gateways, users will need to update these created tokens to include permissions to the services they represent. To do this, they should add separate policies or attach service identities to avoid having this job overwrite their configuration.

The Terraform provider proved very useful in granting the ability to create an external node and service, but when it came to augmenting the policy for the terminating gateway token, I could not find any programmatic mechanism to determine the accessor id of the token I needed to augment.

Given that this process is responsible for the creation of tokens, I figured it could also store the accessor ids somewhere so that these could be referenced by other processes as needed. I have taken the approach of using a single ConfigMap for all tokens but am open to suggestions here.

One shortcoming of the approach taken here which I haven't been able to find a way to workaround (again due to the inability to look up a token via the ACL API without knowing the AccessorID) is that only newly created tokens will be added to the ConfigMap and so requires a freshly bootstrapped ACL system.

Also worth noting is that the changes listed here will require an update to be made to the server-acl-init Role in consul-helm (https://github.com/hashicorp/consul-helm/blob/master/templates/server-acl-init-role.yaml) to add:

- apiGroups:
  - ""
  resources:
  - configmaps
  verbs:
  - create
  - get 
  - update

Finally, as a side note, I did also consider the approach of using the controller to actually update the terminating gateway token policy automatically based on the services listed TerminatingGateway config entry (CRD) as this requirement caught me off guard at first (my bad for not reading the documentation thoroughly enough). However, a prerequisite for that would be to have the accessor id of the token available in the first place. So, this PR could even be seen as an enabler for that to happen.

How I've tested this PR:

Code has been built and deployed to a test cluster, existing secrets were deleted successively and and the config map was checked after rerunning the job.

Checklist:

  • [x] Tests added

onematchfox avatar Apr 22 '21 10:04 onematchfox

Hi @onematchfox thank you for the contribution! It seems like you are deploying Consul K8s without Consul Helm and perhaps directly looking to deploy the Consul K8s via Terraform. Could you tell us more why Consul Helm did not work for you after deploying your infrastructure via Terraform? The Consul Helm provider is also a popular choice among the community.

david-yu avatar Apr 22 '21 17:04 david-yu

Hi @david-yu,

Thanks for looking at this PR.

It seems like you are deploying Consul K8s without Consul Helm and perhaps directly looking to deploy the Consul K8s via Terraform.

My apologies for not making this explicitly clear. I am deploying consul-k8s using the Consul Helm chart.

Could you tell us more why Consul Helm did not work for you after deploying your infrastructure via Terraform?

As I explain in the description, I need to extend the default ACL policy generated by consul-k8s on some of the tokens it creates. I'll focus specifically on the terminating gateway token now as hopefully that use case is the most clear-cut. In order to add an external service to Consul and connect that via a terminating gateway with ACLs enabled one needs to:

  1. register an external node
  2. register an external service
  3. create a service intention between the in-mesh service and the external one
  4. register the service with the terminating gateway
  5. grant the terminating gateway the ACL permission service:write for the external service

~Only step 4 of the list above is currently possibly in a Kubernetes native way (using the TerminatingGateway CRD). To perform the other steps I have to use some other mechanism - be that direct calls to the Consul API or some form of wrapper, Consul CLI/Terraform provider/etc. Steps 1 to 3 are not an issue. However Step 5 is where my issue lies. As I see it, there are two options available (please do let me know if there is another option that I am unaware of! :smile:). Either I can:~ (edit below)

Steps 3 and 4 of the list above can be achieved in a Kubernetes native way (using the ServiceIntention and TerminatingGateway CRDs). But to perform the other steps I have to use some other mechanism - be that direct calls to the Consul API or some form of wrapper, Consul CLI/Terraform provider/etc. Steps 1 and 2 are not an issue. However Step 5 is where my issue lies. As I see it, there are two options available (please do let me know if there is another option that I am unaware of! :smile:). Either I can:

  • adjust the existing policy attached to the terminating gateway token
  • attach a new role/policy to the token

The first option turns out not to be an option at all as the policy created by consul-k8s server-acl-init will be overwritten every time the process is run. This is made clear in the description of the PR that added the creation of the terminating gateway tokens - https://github.com/hashicorp/consul-k8s/pull/264.

So, I am left with the option of trying to attach a new role/policy to the token that was created by consul-k8s server-acl-init. To do that I need to be able to determine the correct token to attach the policy to - I.e. I need the AccessorID of the token so a call can be made to https://www.consul.io/api-docs/acl/tokens#update-a-token to add the new role/policy. Currently that information, the AccessorID, is discarded by this process as only the SecretID is actually stored anywhere when the token is created. This means that at present to be able to deterministically find the right token, I would need to fetch the Kubernetes Secret and iterate over all tokens with the ACL system to find the AccessorID for that SecretID. This isn't exactly a nice thing to have to do. So, this PR attempts to surface the token AccessorID's instead.

Hopefully that explains things a little better?

onematchfox avatar Apr 23 '21 08:04 onematchfox

Hi, thanks for the PR. Overall I think this needs a deeper discussion because it feels like the ConfigMap with the accessor IDs is a workaround. I also think that we do need to solve for this use-case.

Some initial thoughts:

  1. Ideally we offer up a way through CRDs or the Helm chart to modify the ACL tokens/policies that consul-k8s creates. It is a bad UX that users have to "go back in" to k8s via Terraform, etc, to modify the ACL tokens/policies that were created by consul-k8s since those are supposed to be managed by us. Your current use-case with a terminating gateway is a good example. Ideally the TerminatingGateway CRD lets you spin up gateways and automatically modifies the ACL token for you.
  2. Would annotating the secret we create with the accessor ID be useful?

lkysow avatar May 06 '21 21:05 lkysow

Hi @lkysow,

Thanks for taking a look. I am more than happy to discuss options.

  1. Ideally we offer up a way through CRDs or the Helm chart to modify the ACL tokens/policies that consul-k8s creates. It is a bad UX that users have to "go back in" to k8s via Terraform, etc, to modify the ACL tokens/policies that were created by consul-k8s since those are supposed to be managed by us. Your current use-case with a terminating gateway is a good example. Ideally the TerminatingGateway CRD lets you spin up gateways and automatically modifies the ACL token for you.

Agreed! This is somewhat of a workaround. At the end of my description, I mentioned "I did also consider the approach of using the controller to actually update the terminating gateway token policy automatically based on the services listed TerminatingGateway config entry (CRD)". I completely agree that that would be a much better user experience! The reasons I didn't go that far were:

  • this would at least require the Accessor Id to be accessible by the controller which at present it is not - so I saw this as a stepping stone to achieving that
  • adjusting the Consul controller to cater for this would require that the controller is granted ACL write permissions which I wasn't sure would be accepted
  • finally, this would break from the current pattern/implementation/functionality of the TerminatingGateway config using other mechanisms within Consul. So, once again I wasn't sure if that was something that would be considered/accepted. E.g. would one then expect the standard HCL config or a Terraform consul_config_entry resource to behave the same way? This led me down a rabbit hole into whether this should just become standard behaviour within Consul itself and I just didn't feel like that was something I was in a position to judge the worthiness of.
  1. Would annotating the secret we create with the accessor ID be useful?

I did consider adjusting the secret itself. The reason I didn't go down this route was that from an RBAC perspective accessing the secret to obtain the accessor id would require a user to have read access to the secrets in the consul namespace. This permission in turn gives them access to the bootstrap token - a situation which for fairly obvious reasons I would rather avoid. That said, if the controller itself is going to be maintaining the ACL policy then this would cater for allowing that to happen.

As a final point, my only other consideration when approaching this was to acknowledge that I am very focused on my specific use cases and there may be others that come up. As an example, I also mentioned extending the anonymous token policy to allow access to metrics endpoints. In this case, I can hard-code the token id but that feels a little brittle to me as there are no guarantees that this id does not change and in doing so, break the code I have written against this. Whilst I do agree that using Terraform to come back into Kubernetes is perhaps not an ideal solution, I do feel like that is better than a hard-coded id in a token/policy update somewhere. Would you also expect the controller to handle update mechanisms for other tokens? Or, perhaps server-acl-init should be able to take in some additional parameters so a user can extend the default policies? Personally, I felt like trying to foresee and cater for all potential use cases where users may want to augment the default policies would lead to maintenance nightmare and hence, took the option of just exposing the token accessor ids so folks can do with them as they see fit.

onematchfox avatar May 07 '21 07:05 onematchfox

Yeah I think you're totally right. Thinking more about this, even if we make everything configurable via Helm/CRD there's still probably going to be a use-case that requires exposing the accessor IDs and right now we don't make everything configurable so this is a great escape hatch.

So I actually think makes a lot of sense. It's kinda like a Terraform output.

This is a medium-sized change so we'll need to discuss it as a team and prioritize it onto backlog.

lkysow avatar May 10 '21 16:05 lkysow

CLA assistant check
All committers have signed the CLA.

hashicorp-cla avatar Mar 12 '22 17:03 hashicorp-cla