temporal icon indicating copy to clipboard operation
temporal copied to clipboard

Support log-less graceful shutdown without "Error looking up host for shardID" errors

Open cretz opened this issue 2 years ago • 2 comments
trafficstars

Is your feature request related to a problem? Please describe.

Today shutting down a single-binary server gives something like:

2023-05-11T14:45:49.683-0700	ERROR	Error looking up host for shardID	{"component": "shard-controller", "address": "127.0.0.1:33401", "error": "Not enough hosts to serve the request", "operation-result": "OperationFailed", "shard-id": 1, "logging-call-at": "controller_impl.go:387"}
go.temporal.io/server/common/log.(*zapLogger).Error
	/home/runner/go/pkg/mod/go.temporal.io/[email protected]/common/log/zap_logger.go:150
go.temporal.io/server/service/history/shard.(*ControllerImpl).acquireShards.func2
	/home/runner/go/pkg/mod/go.temporal.io/[email protected]/service/history/shard/controller_impl.go:387
go.temporal.io/server/service/history/shard.(*ControllerImpl).acquireShards.func3
	/home/runner/go/pkg/mod/go.temporal.io/[email protected]/service/history/shard/controller_impl.go:427
2023-05-11T14:45:50.685-0700	WARN	Failed to poll for task.	{"service": "worker", "Namespace": "temporal-system", "TaskQueue": "temporal-sys-tq-scanner-taskqueue-0", "WorkerID": "431854@monolith@", "WorkerType": "WorkflowWorker", "Error": "error reading from server: EOF", "logging-call-at": "internal_worker_base.go:308"}
2023-05-11T14:45:50.685-0700	WARN	Failed to poll for task.	{"service": "worker", "Namespace": "temporal-system", "TaskQueue": "temporal-sys-processor-parent-close-policy", "WorkerID": "431854@monolith@", "WorkerType": "WorkflowWorker", "Error": "error reading from server: EOF", "logging-call-at": "internal_worker_base.go:308"}
2023-05-11T14:45:50.685-0700	WARN	Failed to poll for task.	{"service": "worker", "Namespace": "temporal-system", "TaskQueue": "temporal-sys-history-scanner-taskqueue-0", "WorkerID": "431854@monolith@", "WorkerType": "ActivityWorker", "Error": "error reading from server: EOF", "logging-call-at": "internal_worker_base.go:308"}
2023-05-11T14:45:50.685-0700	WARN	Failed to poll for task.	{"service": "worker", "Namespace": "temporal-system", "TaskQueue": "temporal-sys-tq-scanner-taskqueue-0", "WorkerID": "431854@monolith@", "WorkerType": "ActivityWorker", "Error": "error reading from server: EOF", "logging-call-at": "internal_worker_base.go:308"}
2023-05-11T14:45:50.685-0700	WARN	Failed to poll for task.	{"service": "worker", "Namespace": "temporal-system", "TaskQueue": "temporal-sys-batcher-taskqueue", "WorkerID": "431854@monolith@", "WorkerType": "ActivityWorker", "Error": "error reading from server: EOF", "logging-call-at": "internal_worker_base.go:308"}
2023-05-11T14:45:50.685-0700	WARN	Failed to poll for task.	{"service": "worker", "Namespace": "temporal-system", "TaskQueue": "temporal-sys-history-scanner-taskqueue-0", "WorkerID": "431854@monolith@", "WorkerType": "WorkflowWorker", "Error": "error reading from server: EOF", "logging-call-at": "internal_worker_base.go:308"}

Describe the solution you'd like

Any solution that does not make a user think there is an error when they see logs. If this needs to be only half-done here and then something done at https://github.com/temporalio/cli to properly shutdown or swallow or something, no prob.

cretz avatar May 22 '23 21:05 cretz

Also, sometimes you get:

2023-10-05T11:58:24.496+0300	ERROR	Unable to call matching.PollActivityTaskQueue.	{"service": "frontend", "wf-task-queue-name": "/_sys/temporal-sys-history-scanner-taskqueue-0/2", "timeout": "1m9.999694s", "error": "error reading from server: EOF", "logging-call-at": "workflow_handler.go:1133"}

With a full trace.

cretz avatar Oct 05 '23 12:10 cretz

And by "full trace", @cretz means a V E R Y full trace 😅

garrettks avatar Jan 05 '24 17:01 garrettks