ECONNRESET error from kubernetes watch after some minutes.
Describe the bug If I use the kubernetes watch to listen to resource changes I get ECONNRESET and the watch stops. Is there any chance that the watch can handle underlaying connections errors and restart on his own?
** Client Version **
0.20.0
To Reproduce Steps to reproduce the behavior:
- start a watch and wait longer than the setTimeout or setKeepAlive setting in the Watch config.
Expected behavior A watch runs without connection issues.
** Example Code**
function waitForPodCompletion(log: Context['log'], k8sConfig: KubeConfig, podNamespace: string, resourceVersion?: string, jobName?: string): Promise<V1Pod> {
let lastResourceVersion = resourceVersion;
return new Promise<V1Pod>((resolve, reject) => {
const watch = new Watch(k8sConfig);
const queryParams: { labelSelector: string, resourceVersion?: string } = { labelSelector: `job-name=${jobName}` };
if (resourceVersion) {
queryParams.resourceVersion = resourceVersion;
}
watch.watch(`/api/v1/namespaces/${podNamespace}/pods`, queryParams, (eventType, pod: V1Pod) => {
lastResourceVersion = pod.metadata?.resourceVersion;
// log.info("WATCH RESULT" + JSON.stringify(pod));
if (eventType === 'ADDED' && pod.metadata?.name) {
log.info(`Job pod ${pod.metadata.name} ${pod.metadata?.resourceVersion} added.`);
}
if (eventType === 'MODIFIED' && pod.metadata?.name) {
log.info(`Job pod ${pod.metadata.name} status: ${pod.status?.phase}, resourceVersion: ${pod.metadata?.resourceVersion}.`);
if (pod.status?.phase === 'Succeeded') {
//log.info("WATCH RESULT" + JSON.stringify(pod));
resolve(pod);
} else if (pod.status?.phase === 'Failed') {
reject(new Error(`Job failed. Pod ${pod.metadata.name} status: ${pod.status.phase} startTime: ${pod.status.startTime}.`));
}
}
}, (error: { code: string, message: string, stack: string }) => {
// strange, here I get "null" call, short after the ECONNRESET
if (error){
reject(error);
}
})
}).catch(onrejected => {
if (onrejected && onrejected.code == 'ECONNRESET') {
log.info(`Restart Watch with ${lastResourceVersion}.`);
return waitForPodCompletion(log, k8sConfig, podNamespace, lastResourceVersion, jobName);
} else {
throw onrejected;
}
})
Environment (please complete the following information):
- OS: Windows
- NodeJS Versionv20.10.0
- Cloud runtime Redhat OpenShift
A watch is tied to a single TCP stream, so when it is broken you need to start a new watch (and you need to re-list also in case you missed something)
The informer class encapsulates this logic and is probably what you are looking for:
https://github.com/kubernetes-client/javascript/blob/master/src/informer.ts
(fwiw, wrt the "informer" name, I think it's confusing, but it got established as the standard name within the go client library, so we use it here too for consistency.)
Thanks for the information. But the informer class has the same problem. The informer also throws the inner connection errors.
Same issue here with informer. Tried workaround of periodically starting the informer as suggested in https://github.com/kubernetes-client/javascript/issues/596. Nonetheless, a new issue was hit (see https://github.com/kubernetes-client/javascript/issues/1598)
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove-lifecycle rotten
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Reopen this issue with
/reopen - Mark this issue as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closedYou can:
- Reopen this issue with
/reopen- Mark this issue as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.