python
python copied to clipboard
Reading returncode on failed stream command raises invalid literal error, instead of error cause
What happened (please include outputs or screenshots):
When trying to read returncode
on a WSClient
object that failed, a misleading error message is raised:
File "site-packages/kubernetes/stream/ws_client.py", line 238, in returncode
self._returncode = int(err['details']['causes'][0]['message'])
ValueError: invalid literal for int() with base 10: 'error executing command in container: failed to exec in container:
failed to start exec "xxxxxxxxxxxxxxxx": OCI runtime exec failed: exec failed: conta
What you expected to happen:
More correct error message is raised, containing the actual error cause: starting container process caused: exec: \"sleep 100\": executable file not found in $PATH: unknown
"
I added a breakpoint where returncode is being read as the error happens and dumped the contents of err
, it looks like this and has the real error cause available:
{
"metadata": {},
"status": "Failure",
"message": "Internal error occurred: error executing command in container: failed to exec in container: failed to start exec \"xxxxxxxxxxxxxxxxxx\": OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: \"sleep 100\": executable file not found in $PATH: unknown",
"reason": "InternalError",
"details": {
"causes": [
{
"message": "error executing command in container: failed to exec in container: failed to start exec \"276a8816e6619bb8e43a4a6664d2ac89503979d0223dee97be2c1847b7855cee\": OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: \"sleep 100\": executable file not found in $PATH: unknown"
}
]
},
"code": 500
}
How to reproduce it (as minimally and precisely as possible):
from kubernetes.stream import stream
...
...
s = stream(
_client.connect_get_namespaced_pod_exec,
_pod.metadata.name,
_pod.metadata.namespace,
command=["sleep 100"], # This is wrong, which later triggers the faulty error detection
async_req=False,
stderr=True,
stdin=False,
stdout=True,
tty=False,
_preload_content=False,
)
while (retcode := s.returncode) is None:
tail_logs(s)
tail_logs(s)
Environment:
- Kubernetes version: 1.23.5
- Python version `Python 3.9.7
- Python client version: 23.6.0
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
Issue remains, here is relevant code: https://github.com/kubernetes-client/python/blob/master/kubernetes/base/stream/ws_client.py#L238
I've just run into this as well.
Here's a hacky workaround:
def get_kubernetes_ws_client_returncode(ws_client: WSClient) -> None | int | str:
"""Workaround for broken WSClient.returncode property."""
# This is a modified version of the WSClient.returncode code.
# See: https://github.com/kubernetes-client/python/issues/1840
if ws_client.is_open():
return None
else:
err = ws_client.read_channel(ERROR_CHANNEL)
err = yaml.safe_load(err)
if err["status"] == "Success":
return 0
errmsg = err["details"]["causes"][0]["message"]
try:
return int(errmsg)
except ValueError:
return errmsg
Example data returned when reproducing this, by trying to exec "kill" without a shell:
{
"metadata": {},
"status": "Failure",
"message": "Internal error occurred: error executing command in container: failed to exec in container: failed to start exec \"5f5aa33d0d7470b9d01365d10140f31a5bb2a76d44c14288b694351c583429d8\": OCI runtime exec failed: exec failed: unable to start container process: exec: \"kill\": executable file not found in $PATH: unknown",
"reason": "InternalError",
"details": {
"causes": [
{
"message": "error executing command in container: failed to exec in container: failed to start exec \"5f5aa33d0d7470b9d01365d10140f31a5bb2a76d44c14288b694351c583429d8\": OCI runtime exec failed: exec failed: unable to start container process: exec: \"kill\": executable file not found in $PATH: unknown"
}
]
},
"code": 500
}
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
Confirmed issue still remains in latest version.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
Confirmed issue still remains in latest version.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale