numaflow icon indicating copy to clipboard operation
numaflow copied to clipboard

Encountered error in sinkFn - CANCELLED: client cancelled

Open nagarajatantry opened this issue 10 months ago • 2 comments

Update numaflow controller from rc1 to rc4. I see this error message in the sink vertex. Sink Pods remained in Running State.

Error in numa container

{"level":"error","ts":"2024-04-08T18:38:21.170233545Z","logger":"numaflow.Sink-processor","caller":"forward/forward.go:415","msg":"Retrying failed messages","pipeline":"kafka-test-pipeline-1","vertex":"custom-out","errors":{"gRPC client.SinkFn failed, failed to execute stream.Send(value:\"..."  event_time:{seconds:1712601333  nanos:777000000}  watermark:{seconds:-62135596800}  id:\"\\x00\\x00\\x00\\x00\\x00\\xe25\\xc5-input-0\"): rpc error: code = Internal desc = grpc: error while marshaling: string field contains invalid UTF-8":98},"pipeline":"kafka-test-pipeline-1","vertex":"custom-out","partition_name":"custom-out","stacktrace":"github.com/numaproj/numaflow/pkg/sinks/forward.(*DataForward).writeToBuffer\n\t/Users/yhl01/Documents/numaproj/numaflow/pkg/sinks/forward/forward.go:415\ngithub.com/numaproj/numaflow/pkg/sinks/forward.(*DataForward).forwardAChunk\n\t/Users/yhl01/Documents/numaproj/numaflow/pkg/sinks/forward/forward.go:271\ngithub.com/numaproj/numaflow/pkg/sinks/forward.(*DataForward).Start.func1\n\t/Users/yhl01/Documents/numaproj/numaflow/pkg/sinks/forward/forward.go:133"}

error in custom sink container

2024-04-08T18:38:21,173+0000-ERROR-"grpc-default-executor-0" -i.n.n.sinker.Service-68-Encountered error in sinkFn - CANCELLED: client cancelled 

nagarajatantry avatar Apr 08 '24 19:04 nagarajatantry

This is because of the stale messages in the ISB. I am assuming that the error count should have spiked up and alerted the user. We should think of a better user experience?

vigith avatar Apr 08 '24 19:04 vigith

this was in a nonprod environment with very low tps, so it would have been difficult to catch with an alert. We may need a better way to detect from the platform perspective since the id field is managed internally by the platform

nagarajatantry avatar Apr 08 '24 19:04 nagarajatantry