python Reading returncode on failed stream command raises invalid literal error, instead of error cause

What happened (please include outputs or screenshots): When trying to read returncode on a WSClient object that failed, a misleading error message is raised:

  File "site-packages/kubernetes/stream/ws_client.py", line 238, in returncode
    self._returncode = int(err['details']['causes'][0]['message'])
ValueError: invalid literal for int() with base 10: 'error executing command in container: failed to exec in container: 
failed to start exec "xxxxxxxxxxxxxxxx": OCI runtime exec failed: exec failed: conta

What you expected to happen: More correct error message is raised, containing the actual error cause: starting container process caused: exec: \"sleep 100\": executable file not found in $PATH: unknown"

I added a breakpoint where returncode is being read as the error happens and dumped the contents of err, it looks like this and has the real error cause available:

{
    "metadata": {},
    "status": "Failure",
    "message": "Internal error occurred: error executing command in container: failed to exec in container: failed to start exec \"xxxxxxxxxxxxxxxxxx\": OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: \"sleep 100\": executable file not found in $PATH: unknown",
    "reason": "InternalError",
    "details": {
        "causes": [
            {
                "message": "error executing command in container: failed to exec in container: failed to start exec \"276a8816e6619bb8e43a4a6664d2ac89503979d0223dee97be2c1847b7855cee\": OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: \"sleep 100\": executable file not found in $PATH: unknown"
            }
        ]
    },
    "code": 500
}

How to reproduce it (as minimally and precisely as possible):

from kubernetes.stream import stream
...
...
s = stream(
    _client.connect_get_namespaced_pod_exec,
    _pod.metadata.name,
    _pod.metadata.namespace,
    command=["sleep 100"],                 # This is wrong, which later triggers the faulty error detection
    async_req=False,
    stderr=True,
    stdin=False,
    stdout=True,
    tty=False,
    _preload_content=False,
)
while (retcode := s.returncode) is None:
    tail_logs(s)
tail_logs(s)

Environment:

Kubernetes version: 1.23.5
Python version `Python 3.9.7
Python client version: 23.6.0

Jun 22 '22 06:06 hterik

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Sep 20 '22 06:09 k8s-triage-robot

/remove-lifecycle stale

Issue remains, here is relevant code: https://github.com/kubernetes-client/python/blob/master/kubernetes/base/stream/ws_client.py#L238

Sep 23 '22 05:09 hterik

I've just run into this as well.

Here's a hacky workaround:

def get_kubernetes_ws_client_returncode(ws_client: WSClient) -> None | int | str:
    """Workaround for broken WSClient.returncode property."""
    # This is a modified version of the WSClient.returncode code.
    # See: https://github.com/kubernetes-client/python/issues/1840
    if ws_client.is_open():
        return None
    else:
        err = ws_client.read_channel(ERROR_CHANNEL)
        err = yaml.safe_load(err)
        if err["status"] == "Success":
            return 0
        errmsg = err["details"]["causes"][0]["message"]
        try:
            return int(errmsg)
        except ValueError:
            return errmsg

Example data returned when reproducing this, by trying to exec "kill" without a shell:

{
  "metadata": {},
  "status": "Failure",
  "message": "Internal error occurred: error executing command in container: failed to exec in container: failed to start exec \"5f5aa33d0d7470b9d01365d10140f31a5bb2a76d44c14288b694351c583429d8\": OCI runtime exec failed: exec failed: unable to start container process: exec: \"kill\": executable file not found in $PATH: unknown",
  "reason": "InternalError",
  "details": {
    "causes": [
      {
        "message": "error executing command in container: failed to exec in container: failed to start exec \"5f5aa33d0d7470b9d01365d10140f31a5bb2a76d44c14288b694351c583429d8\": OCI runtime exec failed: exec failed: unable to start container process: exec: \"kill\": executable file not found in $PATH: unknown"
      }
    ]
  },
  "code": 500
}

Oct 25 '22 11:10 taleinat

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jan 23 '23 11:01 k8s-triage-robot

/remove-lifecycle stale

Jan 24 '23 05:01 hterik

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Apr 24 '23 06:04 k8s-triage-robot

/remove-lifecycle stale

Confirmed issue still remains in latest version.

Apr 24 '23 06:04 hterik

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jan 19 '24 00:01 k8s-triage-robot

/remove-lifecycle stale

Confirmed issue still remains in latest version.

Jan 19 '24 11:01 hterik

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Apr 18 '24 12:04 k8s-triage-robot

python python copied to clipboard

Reading returncode on failed stream command raises invalid literal error, instead of error cause

python
python copied to clipboard