contest icon indicating copy to clipboard operation
contest copied to clipboard

[jobmanager] Recover from job panics

Open xaionaro opened this issue 1 year ago • 2 comments

An arguable proposal (feel free to just reject it without any explanation).

Mitigating problems like this one:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x2e4c403]

goroutine 17643 [running]:
osf/contest/plugins/reporters/purgatory.(*Reporter).getRackSerial(0x0?, {0xc000a72f40?, 0x0?, 0x0?})
fbcode/osf/contest/plugins/reporters/purgatory/purgatory.go:272 +0x83
osf/contest/plugins/reporters/purgatory.(*Reporter).getFinalReport(0x312be36?, {0xc000a72f40?, 0x1, 0x1})
fbcode/osf/contest/plugins/reporters/purgatory/purgatory.go:336 +0x19d
osf/contest/plugins/reporters/purgatory.(*Reporter).FinalReport(0xc000253088, {0xb25ae0, 0xc001168730}, {0x5c5b00?, 0xc000e19200?}, {0xc000a72f40?, 0xc00144b7e0?, 0xc00144b7e0?}, {0xb17cc0, 0xc0010c2960})
fbcode/osf/contest/plugins/reporters/purgatory/purgatory.go:761 +0x18c
github.com/linuxboot/contest/pkg/runner.(*JobRunner).Run(0xc000270900, {0xb25ae0?, 0xc0011685a0}, 0xc000afdc20, 0x0)
fbcode/third-party-source/go/github.com/linuxboot/contest/pkg/runner/job_runner.go:261 +0x1d83
github.com/linuxboot/contest/pkg/jobmanager.(*JobManager).runJob(0xc0001e5d90, {0xb25ae0, 0xc001629310}, 0xc000afdc20, 0xc0007dcf01?)
fbcode/third-party-source/go/github.com/linuxboot/contest/pkg/jobmanager/start.go:110 +0x325
created by github.com/linuxboot/contest/pkg/jobmanager.(*JobManager).startJob
fbcode/third-party-source/go/github.com/linuxboot/contest/pkg/jobmanager/start.go:85 +0x290

If a single job fails the whole instance has no need to panic.

xaionaro avatar Jul 04 '23 15:07 xaionaro