python icon indicating copy to clipboard operation
python copied to clipboard

`certificate-authority-data` from kubeconfig fails for long-running process due to tmpfs cleanup

Open PaulFurtado opened this issue 2 years ago • 3 comments

What happened (please include outputs or screenshots): Long-running applications using a kubeconfig with certificate-authority-data encounter errors like:

Max retries exceeded with url: /api/v1/pods?fieldSelector=spec.nodeName%3Dip-172-18-66-203.ec2.internal (Caused by SSLError(FileNotFoundError(2, 'No such file or directory'),))

What you expected to happen: Client should not expect tempfiles to live indefinitely. It is extremely common for servers to reap tempfiles.

Anything else we need to know?: If you create a client you can easily see that it is using a temp file that cannot go away for the duration of the client:

>>> import kubernetes.config
>>> kubernetes.config.load_kube_config()
>>> api_client = kubernetes.client.ApiClient()
>>> api_client.rest_client.pool_manager.connection_pool_kw["ca_certs"]
'/tmp/tmpqkht2v2g'

You can reproduce the issue by deleting that temp file and attempting to make a request.

Code is here: https://github.com/kubernetes-client/python/blob/1271465acdb80bf174c50564a384fd6898635ea6/kubernetes/base/config/kube_config.py#L63-L78

In order for this to work reliably for long-running processes on standard linux systems, the temp file really needs to be created for each request rather than a single time at startup.

That said, on linux systems, a potential hack would be to use /proc/self/fd/<fileno> instead of the temfile path since that would share the lifecycle of the process.

Environment:

  • Python version (python --version): 3.6
  • Python client version (pip list | grep kubernetes): 21.7.0

PaulFurtado avatar Apr 20 '22 00:04 PaulFurtado

/assign @yliaog

roycaihw avatar Apr 25 '22 16:04 roycaihw

it seems reasonable, do you mind to send a PR?

yliaog avatar Apr 26 '22 04:04 yliaog

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jul 25 '22 04:07 k8s-triage-robot