client-go
client-go copied to clipboard
memCacheClient: cached transient error leads to resource lookup failures
An error http2: client connection force closed via ClientConn.Close
has occured for some reason (maybe api-server was under load). This lead our deployment tool (helm based one, which makes use of memCacheClient
) to print some errors and fail further resource lookup requests by group-version.
My debug showed that it could occur if these errored responses has been cached and not renewed during further lookups. Possibly it is a failure of the isTransientError
helper, which does not take into account http2: ... ClientConn.Close
error .
Logs:
...
E0603 13:19:34.358760 15514 memcache.go:196] couldn't get resource list for networking.k8s.io/v1beta1: Get "https://domain/apis/networking.k8s.io/v1beta1?timeout=32s": http2: client connection force closed via ClientConn.Close
E0603 13:19:34.358768 15514 memcache.go:196] couldn't get resource list for scheduling.k8s.io/v1: Get "https://domain/apis/scheduling.k8s.io/v1?timeout=32s": http2: client connection force closed via ClientConn.Close
E0603 13:19:34.358783 15514 memcache.go:196] couldn't get resource list for deckhouse.io/v1alpha2: Get "https://domain/apis/deckhouse.io/v1alpha2?timeout=32s": http2: client connection force closed via ClientConn.Close
E0603 13:19:34.358849 15514 memcache.go:196] couldn't get resource list for coordination.k8s.io/v1beta1: Get "https://apidomain/apis/coordination.k8s.io/v1beta1?timeout=32s": http2: client connection force closed via ClientConn.Close
Error: helm templates rendering failed: unable to build kubernetes objects from release manifest: unable to recognize "": no matches for kind "Deployment" in version "apps/v1"
Some more debug
There is strange behaviour of our helm-based tool (which makes use of cli-runtime and client-go in turn): internal error was shadowed and resulted into related resource lookup error.
Debug reveals that this error shadowing explained by this piece of code (https://github.com/kubernetes/client-go/blob/master/restmapper/discovery.go#L151):
func GetAPIGroupResources(cl discovery.DiscoveryInterface) ([]*APIGroupResources, error) {
gs, rs, err := cl.ServerGroupsAndResources()
if rs == nil || gs == nil {
return nil, err
// TODO track the errors and update callers to handle partial errors.
}
— partial errors are ignored. Also errors ignored in the case when rs
or gs
are not nil, but contain zero elements.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.