python
python copied to clipboard
`certificate-authority-data` from kubeconfig fails for long-running process due to tmpfs cleanup
What happened (please include outputs or screenshots):
Long-running applications using a kubeconfig with certificate-authority-data
encounter errors like:
Max retries exceeded with url: /api/v1/pods?fieldSelector=spec.nodeName%3Dip-172-18-66-203.ec2.internal (Caused by SSLError(FileNotFoundError(2, 'No such file or directory'),))
What you expected to happen: Client should not expect tempfiles to live indefinitely. It is extremely common for servers to reap tempfiles.
Anything else we need to know?: If you create a client you can easily see that it is using a temp file that cannot go away for the duration of the client:
>>> import kubernetes.config
>>> kubernetes.config.load_kube_config()
>>> api_client = kubernetes.client.ApiClient()
>>> api_client.rest_client.pool_manager.connection_pool_kw["ca_certs"]
'/tmp/tmpqkht2v2g'
You can reproduce the issue by deleting that temp file and attempting to make a request.
Code is here: https://github.com/kubernetes-client/python/blob/1271465acdb80bf174c50564a384fd6898635ea6/kubernetes/base/config/kube_config.py#L63-L78
In order for this to work reliably for long-running processes on standard linux systems, the temp file really needs to be created for each request rather than a single time at startup.
That said, on linux systems, a potential hack would be to use /proc/self/fd/<fileno>
instead of the temfile path since that would share the lifecycle of the process.
Environment:
- Python version (
python --version
): 3.6 - Python client version (
pip list | grep kubernetes
): 21.7.0
/assign @yliaog
it seems reasonable, do you mind to send a PR?
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale