csharp icon indicating copy to clipboard operation
csharp copied to clipboard

"Error while copying content to a stream" when watching empty set

Open nabsul opened this issue 2 years ago • 11 comments

When I attempt to watch for changes on a set of objects that is empty, the response body from the Kubernetes API is empty and causes an exception.

I am able to repro this in Linqpad with this code:

var cfg = KubernetesClientConfiguration.BuildDefaultConfig();
var client = new Kubernetes(cfg);
var result = await client.ListConfigMapForAllNamespacesWithHttpMessagesAsync(watch: true, labelSelector: "test-not-exist=val", timeoutSeconds: 10);

Notes:

  • For this error to repro, you must create a List request that returns zero results
  • I have observed this error with other data types, not just ConfigMap

The resulting error is:

HttpRequestException
Error while copying content to a stream.
Data
(0 items)
HelpLinknull
HResult-2147024858
InnerException
MessageError while copying content to a stream.
SourceSystem.Net.Http
StackTrace at System.Net.Http.HttpContent.LoadIntoBufferAsyncCore(Task serializeToStreamTask, MemoryStream tempBuffer)
at System.Net.Http.HttpContent.WaitAndReturnAsync[TState,TResult](Task waitTask, TState state, Func`2 returnFunc)
at k8s.Kubernetes.CreateResultAsync[T](HttpRequestMessage httpRequest, HttpResponseMessage httpResponse, Nullable`1 watch, CancellationToken cancellationToken) in /_/src/KubernetesClient/Kubernetes.cs:line 84
at k8s.AbstractKubernetes.ListConfigMapForAllNamespacesWithHttpMessagesAsync(Nullable`1 allowWatchBookmarks, String continueParameter, String fieldSelector, String labelSelector, Nullable`1 limit, Nullable`1 pretty, String resourceVersion, String resourceVersionMatch, Nullable`1 timeoutSeconds, Nullable`1 watch, IReadOnlyDictionary`2 customHeaders, CancellationToken cancellationToken) in /_/src/KubernetesClient.Basic/LibKubernetesGenerator/LibKubernetesGenerator.KubernetesClientSourceGenerator/Kubernetes.g.cs:line 972
at UserQuery.Main(), line 3
StatusCodenull
TargetSite

nabsul avatar Jun 19 '22 20:06 nabsul

I believe it is a true EOF from server side why timeoutSeconds == 10? could you please make it HUUUGE? see #835

tg123 avatar Jun 20 '22 07:06 tg123

I made it small to easily demo the issue. But the same happens with a large timeout.

nabsul avatar Jun 20 '22 12:06 nabsul

Also, I saw the above issue and PR which seem very similar to my issue. I'm on the latest client version, but I still get the above error.

nabsul avatar Jun 20 '22 12:06 nabsul

tested var result = await client.ListConfigMapForAllNamespacesWithHttpMessagesAsync(watch: true, labelSelector: "test-not-exist=val"); (no timeout param) locally

cant repro looks like the connection is closed from server side, could you please do tcpdump to verify?

tg123 avatar Jun 20 '22 20:06 tg123

@tg123: How long did you wait for that statement to complete? If I remove the timeout, the error happens, but only after about an hour. That's the reason I added the 10 second timeout. It's to demonstrate the error faster.

nabsul avatar Jun 20 '22 21:06 nabsul

Below are some sample logs with timestamps from my open source project. I'm currently working around this error with a try-catch and infinite loop of the watch: https://github.com/nabsul/kcert/blob/0e3e5ad998bd4b0063accfe2dbc1d0eca18f0ba4/Services/K8sWatchClient.cs#L66

{
  "Timestamp": "2022:06:20 10:05:18.377804",
  "EventId": 0,
  "LogLevel": "Information",
  "Category": "KCert.Services.K8sClient",
  "Message": "Empty Kubernetes client result threw an exception. Retrying V1ConfigMap[kcert.dev/cert-request=request].",
  "State": {
    "Message": "Empty Kubernetes client result threw an exception. Retrying V1ConfigMap[kcert.dev/cert-request=request].",
    "type": "V1ConfigMap",
    "label": "kcert.dev/cert-request=request",
    "{OriginalFormat}": "Empty Kubernetes client result threw an exception. Retrying {type}[{label}]."
  }
}
{
  "Timestamp": "2022:06:20 11:01:06.095747",
  "EventId": 0,
  "LogLevel": "Information",
  "Category": "KCert.Services.K8sClient",
  "Message": "Empty Kubernetes client result threw an exception. Retrying V1ConfigMap[kcert.dev/cert-request=request].",
  "State": {
    "Message": "Empty Kubernetes client result threw an exception. Retrying V1ConfigMap[kcert.dev/cert-request=request].",
    "type": "V1ConfigMap",
    "label": "kcert.dev/cert-request=request",
    "{OriginalFormat}": "Empty Kubernetes client result threw an exception. Retrying {type}[{label}]."
  }
}
{
  "Timestamp": "2022:06:20 11:32:03.076967",
  "EventId": 0,
  "LogLevel": "Information",
  "Category": "KCert.Services.K8sClient",
  "Message": "Empty Kubernetes client result threw an exception. Retrying V1ConfigMap[kcert.dev/cert-request=request].",
  "State": {
    "Message": "Empty Kubernetes client result threw an exception. Retrying V1ConfigMap[kcert.dev/cert-request=request].",
    "type": "V1ConfigMap",
    "label": "kcert.dev/cert-request=request",
    "{OriginalFormat}": "Empty Kubernetes client result threw an exception. Retrying {type}[{label}]."
  }
}

nabsul avatar Jun 20 '22 21:06 nabsul

I mean it is a feature, not a bug the server side cut the connection because of timeout and you get eof exception because no more event coming.

tg123 avatar Jun 21 '22 06:06 tg123

I'm not super convinced that this should be considered a feature. The behavior of the API timing out and returning an empty body is normal and expected. Shouldn't the client check for this case and handle it instead of throwing an exception while trying to read an empty stream?

nabsul avatar Jun 21 '22 15:06 nabsul

the behavior is same in golang client the server side cut the connection and you will see no more event

opt := v1.ListOptions{
		LabelSelector: "not-exists",
		Watch:         true,
		TimeoutSeconds: new(int64),
	}

	*opt.TimeoutSeconds = 5

	wacher, err := api.Pods("default").Watch(context.TODO(), opt)

	if err != nil {
		panic(err)
	}

	fmt.Println("start watch")
	for evt := range wacher.ResultChan() {
		fmt.Println(evt)
	}
	fmt.Println("no more events")

also in tcp level (127.0.0.1.8001 as api server)

16:38:06.023277 IP 127.0.0.1.8001 > 127.0.0.1.37494: Flags [P.], seq 342:347, ack 172, win 512, options [nop,nop,TS val 387078250 ecr 387073250], length 5
        0x0000:  0000 0304 0006 0000 0000 0000 0000 0800
        0x0010:  4500 0039 22ef 4000 4006 19ce 7f00 0001
        0x0020:  7f00 0001 1f41 9276 2968 3680 ad95 b9b8
        0x0030:  8018 0200 fe2d 0000 0101 080a 1712 586a
        0x0040:  1712 44e2 300d 0a0d 0a

you can see 0d 0a0d 0a from the server which means end of http response

tg123 avatar Jun 21 '22 16:06 tg123

@nabsul let me know if you still have any idea about this issue I also revisited current impl, I believe the async foreach should yield break comparing to go impl here is yield break if the first line is not null https://github.com/kubernetes-client/csharp/blob/56646aeed3ee289146be9efda4d8da8756b8ac69/src/KubernetesClient/Watcher.cs#L182

tg123 avatar Jun 29 '22 15:06 tg123

Thanks for the follow up @tg123. a yield break makes a lot of sense to me. That way I would not need to wrap my code like I'm doing here: https://github.com/nabsul/kcert/blob/0e3e5ad998bd4b0063accfe2dbc1d0eca18f0ba4/Services/K8sWatchClient.cs#L66

nabsul avatar Jun 29 '22 18:06 nabsul

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Sep 27 '22 19:09 k8s-triage-robot

No worries. We had a difference in opinion on what should be done here anyways. I'm closing this issue.

nabsul avatar Sep 27 '22 20:09 nabsul