python icon indicating copy to clipboard operation
python copied to clipboard

trust chain is not followed when Kubernetes CAs are intermediate CAs

Open brainplot opened this issue 2 years ago • 17 comments

What happened (please include outputs or screenshots): I was trying the client to obtain info about the running pods in a freshly-installed Kubernetes cluster using exactly the example provided in the README.md but I was hit with this SSL error:

Listing pods with their IPs:
Traceback (most recent call last):
  File "/home/student/.local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 715, in urlopen
    httplib_response = self._make_request(
  File "/home/student/.local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 404, in _make_request
    self._validate_conn(conn)
  File "/home/student/.local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1058, in _validate_conn
    conn.connect()
  File "/home/student/.local/lib/python3.10/site-packages/urllib3/connection.py", line 419, in connect
    self.sock = ssl_wrap_socket(
  File "/home/student/.local/lib/python3.10/site-packages/urllib3/util/ssl_.py", line 449, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(
  File "/home/student/.local/lib/python3.10/site-packages/urllib3/util/ssl_.py", line 493, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
  File "/usr/lib/python3.10/ssl.py", line 513, in wrap_socket
    return self.sslsocket_class._create(
  File "/usr/lib/python3.10/ssl.py", line 1100, in _create
    self.do_handshake()
  File "/usr/lib/python3.10/ssl.py", line 1371, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get issuer certificate (_ssl.c:1007)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/student/pods.py", line 8, in <module>
    ret = v1.list_pod_for_all_namespaces(watch=False)
  File "/home/student/.local/lib/python3.10/site-packages/kubernetes/client/api/core_v1_api.py", line 17485, in list_pod_for_all_namespaces
    return self.list_pod_for_all_namespaces_with_http_info(**kwargs)  # noqa: E501
  File "/home/student/.local/lib/python3.10/site-packages/kubernetes/client/api/core_v1_api.py", line 17596, in list_pod_for_all_namespaces_with_http_info
    return self.api_client.call_api(
  File "/home/student/.local/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 348, in call_api
    return self.__call_api(resource_path, method,
  File "/home/student/.local/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 180, in __call_api
    response_data = self.request(
  File "/home/student/.local/lib/python3.10/site-packages/kubernetes/client/api_client.py", line 373, in request
    return self.rest_client.GET(url,
  File "/home/student/.local/lib/python3.10/site-packages/kubernetes/client/rest.py", line 244, in GET
    return self.request("GET", url,
  File "/home/student/.local/lib/python3.10/site-packages/kubernetes/client/rest.py", line 217, in request
    r = self.pool_manager.request(method, url,
  File "/home/student/.local/lib/python3.10/site-packages/urllib3/request.py", line 77, in request
    return self.request_encode_url(
  File "/home/student/.local/lib/python3.10/site-packages/urllib3/request.py", line 99, in request_encode_url
    return self.urlopen(method, url, **extra_kw)
  File "/home/student/.local/lib/python3.10/site-packages/urllib3/poolmanager.py", line 376, in urlopen
    response = conn.urlopen(method, u.request_uri, **kw)
  File "/home/student/.local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 827, in urlopen
    return self.urlopen(
  File "/home/student/.local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 827, in urlopen
    return self.urlopen(
  File "/home/student/.local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 827, in urlopen
    return self.urlopen(
  File "/home/student/.local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 799, in urlopen
    retries = retries.increment(
  File "/home/student/.local/lib/python3.10/site-packages/urllib3/util/retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='k8s.desolabs.com', port=6443): Max retries exceeded with url: /api/v1/pods?watch=False (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get issuer certificate (_ssl.c:1007)')))

What you expected to happen: I was expecting the example to work 😄

How to reproduce it (as minimally and precisely as possible): To be honest, I'm not sure. This is a freshly installed Ubuntu machine with a freshly-installed Kubernetes cluster.

Anything else we need to know?: The cluster is generating its certificates using a custom CA that all nodes trust (thanks to the update-ca-certificates script), including the one I'm running this on. It should be noted that kubectl works perfectly fine with no issues whatsoever!

Environment:

  • Kubernetes version (kubectl version):

    Client Version: v1.28.4
    Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
    Server Version: v1.28.4
    
  • OS (e.g., MacOS 10.13.6):

    Ubuntu 22.04.3 LTS
    Linux student 5.15.0-91-generic #101-Ubuntu SMP Tue Nov 14 13:30:08 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
    
  • Python version (python --version)

    Python 3.10.12
    
  • Python client version (pip list | grep kubernetes)

    kubernetes                   28.1.0
    

brainplot avatar Dec 09 '23 05:12 brainplot

Is a problem with urllib version. Try to use 1.x urllib version.

eloymg avatar Dec 11 '23 12:12 eloymg

I think I'm already using that.

$ pip list | grep urllib
urllib3                1.26.5

If I try to pip install the requirements.txt file that's provided in the repo, nothing gets installed/updated. According to pip, my dependencies meet the version requirements.

brainplot avatar Dec 11 '23 12:12 brainplot

@eloymg I have the same problem. I am using following kubeconfig file:

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: Base64-Encrypted Key
    server: https://test.....cloud:6443
  name: kubernetes

contexts:
- context:
    cluster: kubernetes
    user: kubernetes-admin
  name: kubernetes-admin@kubernetes

current-context: kubernetes-admin@kubernetes
kind: Config
preferences: {}
users:
- name: kubernetes-admin
  user:
    token: Base 64 Token

And following three lines:

        from kubernetes import client, config

        config.load_kube_config('./kube_config')
        v1 = client.CoreV1Api()
        v1.list_pod_for_all_namespaces(watch=False)
[ WARN ] Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get issuer certificate (_ssl.c:1000)'))': /api/v1/pods?watch=False
[ WARN ] Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get issuer certificate (_ssl.c:1000)'))': /api/v1/pods?watch=False
[ WARN ] Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get issuer certificate (_ssl.c:1000)'))': /api/v1/pods?watch=False

urllib3 version: 1.26.18

gleees384 avatar Jan 23 '24 09:01 gleees384

@eloymg I have the same problem too.

urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='xxxxx', port=xxxx): Max retries exceeded with url: /apis/batch/v1/namespaces/default/jobs (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate (_ssl.c:1006)')))

hai0118 avatar Jan 23 '24 09:01 hai0118

After a bit of digging, I found out what the cause of my issue is. The problem occurs when I manually generate my Kubernetes CA certificates as intermediate certificates using a custom CA.

I followed this guide to do so.

I would like to point out that the Root CA certificate that was used to generate the intermediate CA certificates (as shown in the link above) is trusted by the machine and was placed under /usr/local/share/ca-certificates. Like I said, kubectl and the rest of Kubernetes in general work just fine! It's just this client that doesn't. It's as if it expects the Kubernetes CA certificates to be root certificates, without following the trust chain.

brainplot avatar Jan 24 '24 09:01 brainplot

I would like to point out that the Root CA certificate that was used to generate the intermediate CA certificates (as shown in the link above) is trusted by the machine and was placed under /usr/local/share/ca-certificates.

@brainplot Nice finding! I wonder if you would like to propose a fix?

roycaihw avatar Feb 12 '24 17:02 roycaihw

Hi,

After reading rest.py code:

# cert_reqs
if configuration.verify_ssl:
    cert_reqs = ssl.CERT_REQUIRED
else:
    cert_reqs = ssl.CERT_NONE

In your code try :

from kubernetes import client, config

        config.load_kube_config('./kube_config')
        config.verify_ssl=False                                     ## <<< Perhaps can be setup in config
        v1 = client.CoreV1Api()
        v1.list_pod_for_all_namespaces(watch=False)

It works for me (no more SSL issue), my code:

configuration = kubernetes.client.Configuration()
# Configure API key authorization: BearerToken
configuration.api_key['authorization'] = 'YOUR_API_KEY'
# Uncomment below to setup prefix (e.g. Bearer) for API key, if needed
# configuration.api_key_prefix['authorization'] = 'Bearer'

requests.packages.urllib3.disable_warnings()

# Defining host is optional and default to http://localhost
configuration.host = "https://10.96.0.1"
configuration.verify_ssl=False

# Defining host is optional and default to http://localhost
# Enter a context with an instance of the API kubernetes.client

api_client=kubernetes.client.ApiClient(configuration)

# Create an instance of the API class
api_instance = kubernetes.client.WellKnownApi(api_client)

coxifred avatar Feb 13 '24 10:02 coxifred

I understand how that can work but there's no reason why I should disable SSL/TLS verification since my setup has a perfectly valid certificate trust chain.

brainplot avatar Feb 17 '24 21:02 brainplot

Same problem here connecting to EKS v1.26 cluster using in-cluster configuration. Tried:

config.load_incluster_config()
config.verify_ssl=False

Still doesn't work:

WARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SS
LError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2427)'))': /api/v1/namespaces/sfuga/configmaps

I'm using Alpine linux v3.19:

# apk info py3-urllib3
py3-urllib3-1.26.18-r0 description:
HTTP library with thread-safe connection pooling, file post, and more

py3-urllib3-1.26.18-r0 webpage:
https://github.com/urllib3/urllib3

py3-urllib3-1.26.18-r0 installed size:
580 KiB

atmosx avatar Feb 22 '24 12:02 atmosx

Spent a few hours debugging this issue. It appears that the API and client are functioning as expected, but the error message is confusing for users. The issue is caused the size of the configMap.

Kubernetes configMaps have a size limit of 1MB. This limit is set by etcd, which has a limit of 1.5MB. When the object exceeds 1MB, urllib3 returns an error that is not very clear.

In my case the file was ~12MB, so obviously doesn't fit in a configMap.

Here is a sample code to test that configMap creation works:

# Import necessary libraries
from kubernetes import client, config

# Load in-cluster configuration
config.load_incluster_config()

# Create a Kubernetes API client
v1 = client.CoreV1Api()

# Define the configmap data
data = {"data": "123"}

# Create the configmap object
configmap = client.V1ConfigMap(
    api_version="v1",
    kind="ConfigMap",
    metadata=client.V1ObjectMeta(
        name="sample"
    ),
    data=data
)

# Create the configmap in the cluster
v1.create_namespaced_config_map(namespace="sfuga", body=configmap)

# Print success message
print("Configmap created successfully.")

atmosx avatar Feb 22 '24 17:02 atmosx

@atmosx I'm honestly unsure that is relevant here. I had this issue just trying to list pods in my cluster. It's clearly something to do with the certificate the API server serves.

brainplot avatar Feb 26 '24 13:02 brainplot

@brainplot

I had the same issue. I solved it by adding the certificate-authority key to my kubeconfig as mentioned in this post : https://stackoverflow.com/questions/48351308/how-to-specify-ca-bundle-in-kubernetes-python-client

louisgls avatar Mar 28 '24 10:03 louisgls

@louisgls I no longer need this library thus I don't have a reason to try this. However, thank you for providing a solution.

brainplot avatar Mar 28 '24 19:03 brainplot

I'm seeing the same issue when I create a cluster using a single CA certificate as intermediate as described in brainplot's comment . As this is a valid configuration described in Kubernetes' own docs and causes the minimal example described in this project's README.md to fail, I would consider this to be a bug.

@brainplot Thanks for your excellent troubleshooting. Would you mind retitling this issue as "client doesn't follow trust chain when using single CA certificate as intermediate" or something of the sort?

inflatador avatar Apr 20 '24 17:04 inflatador

Thank you @inflatador. I've updated the title and I believe the new one better describes the issue. If not, we can discuss how to clarify further.

brainplot avatar Apr 20 '24 20:04 brainplot

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jul 19 '24 20:07 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Aug 18 '24 20:08 k8s-triage-robot