cockroach icon indicating copy to clipboard operation
cockroach copied to clipboard

multi-tenant: Remove stack trace when on error during start-server

Open ajstorm opened this issue 1 year ago • 2 comments

On the DRT cluster we're seeing a long stack trace when we hit this error on start-server when nodes are draining or have recently failed:

E240228 05:13:22.617985 8260 jobs/job_scheduler.go:434 ⋮ [T3,Vapplication,n2,tenant-orchestration,tenant=‹application›,start-server] 181016  error executing schedules: ‹find-scheduled-jobs›: failed to read query result: query execution canceled

The stack trace being returned (by this log message) is not necessary for problem diagnosis, and should be removed for log cleanliness.

Jira issue: CRDB-36371

ajstorm avatar Mar 04 '24 22:03 ajstorm

@ajstorm - when we see this error during server-start, is the stack trace longer that the log message added in the bug? Looking at the code, I'm assuming multiple log messages are reported (at a scheduled interval). I can remove the stack trace entirely but I'm not sure if it would make debugging harder in other scenarios. I plan to suppress the stack trace only when we are unable to fetch the scheduled jobs (for any reason). What do you think?

cthumuluru-crdb avatar Aug 02 '24 04:08 cthumuluru-crdb

I found an instance of similar log with the help of DRT team. Like noted in the issue, stack trace is not necessary for problem diag. I'm going to remove the stack trace from the log.

cthumuluru-crdb avatar Aug 29 '24 04:08 cthumuluru-crdb