talos icon indicating copy to clipboard operation
talos copied to clipboard

Node errors when cluster.controlPlane.endpoint is set to https://kubernetes.default.svc.cluster.local

Open varet80 opened this issue 1 year ago • 7 comments

I migrated from Kubeadm to Talos.

When I join a new Controlplane node, using the documentation and setting the service-account-issuer to cluster.controlPlane.endpoint. I face an issue with errors in console. The Node cannot join the network.

In the other hand everything, seems to works as should.

If I set the endpoint to https:// and then join the cluster.

APIServer cannot authenticate and generates lines of:

 authentication.go:73] "Unable to authenticate the request" err="invalid bearer token"

The only way to success joining without any errors

  • apply-config with the Loadbalancer url
  • edit Machine Config and change back to https://kubernetes.default.svc.cluster.local

In screenshot I joined a node with endpoint controller set to internal hostname. which is not resolvable.

this is a screenshot from the VM.

Screenshot 2024-07-17 at 2 02 09 PM

varet80 avatar Jul 17 '24 12:07 varet80

It doesn't make sense to set the controlplane endpoint to kubernetes.default.svc.cluster.local in any case, as this is the external way to access the controlplane (not from within a Kubernetes pod).

If your case is to update service-account-issuer, let's make this field configurable in Talos.

smira avatar Jul 17 '24 12:07 smira

I can set the SErvice account issuer to the internal one. but then there are a lot of other errors as it is different to the endpoint.

Also the service-account-issuer should use the internal or external endpoint in the case of kube-apiserver?

varet80 avatar Jul 17 '24 12:07 varet80

You can find it in the Kubernetes documentation:

Identifier of the service account token issuer. The issuer will assert this identifier in "iss" claim of issued tokens. This value is a string or URI. If this option is not a valid URI per the OpenID Discovery 1.0 spec, the ServiceAccountIssuerDiscovery feature will remain disabled, even if the feature gate is set to true. It is highly recommended that this value comply with the OpenID spec: https://openid.net/specs/openid-connect-discovery-1_0.html. In practice, this means that service-account-issuer must be an https URL. It is also highly recommended that this URL be capable of serving OpenID discovery documents at {service-account-issuer}/.well-known/openid-configuration. When this flag is specified multiple times, the first is used to generate tokens and all are used to determine which issuers are accepted.

So it's not clear what you're trying to solve, but setting controlplane endpoint to kubernetes service DNS name is certainly wrong way.

smira avatar Jul 17 '24 12:07 smira

I agree. I am just confused what is the best action here. As the instructions of kubeadm state: Make sure that, on your current Kubeadm cluster, the first --service-account-issuer= parameter in /etc/kubernetes/manifests/kube-apiserver.yaml is equal to the value of .cluster.controlPlane.endpoint in controlplane.yaml. If it’s not, add a new --service-account-issuer= parameter with the correct value before your current one in /etc/kubernetes/manifests/kube-apiserver.yaml on all of your control planes nodes, and restart the kube-apiserver containers. https://www.talos.dev/v1.7/advanced/migrating-from-kubeadm/#step-by-step-guide that is the internal for kubeadm (at least for many cases)

In contrary, Boostraping a node with the right ControlPlane endpoint (Load balancer endpoint). leads to apiserver complaining about the token issue, as the url is not the same as before. "Unable to authenticate the request" err="invalid bearer token" this happens because the apiserver param --service-account-issuer Is also set to LB endpoint. If this is also the best practice,

If i change, after the node is ready the machine config to the internal url, everything starts working.

Keeping the control plane on public endpoint and adding an extra Argument, into the internal endpoint, for that api server leads to more issues, complaining about mismatch, as this way it registers to access both issuer endpoints.

probably having ability to override the parameter, would help to avoid these cases

varet80 avatar Jul 17 '24 13:07 varet80

I'm not quite sure how "the url is not the same as before", if you specify loadbalancer endpoint, as all nodes will have URL for the controlplane endpoint, and, transitively for the service account issuer.

smira avatar Jul 17 '24 16:07 smira

it turns out, using the LB endpoint on all APIs stops the error. a Migration for KubeADM could be: update first apiservers to your LB endpoint and then begin the migration. Can I submit some Documentation updates, for potential errors? in order to help people migrating?

varet80 avatar Jul 19 '24 12:07 varet80

Yes, PRs are always appreciated! The file is at https://github.com/siderolabs/talos/blob/main/website/content/v1.8/advanced/migrating-from-kubeadm.md

steverfrancis avatar Jul 19 '24 15:07 steverfrancis

This issue is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Jan 16 '25 02:01 github-actions[bot]

This issue was closed because it has been stalled for 7 days with no activity.

github-actions[bot] avatar Jan 21 '25 02:01 github-actions[bot]