contest
contest copied to clipboard
[jobmanager] Recover from job panics
An arguable proposal (feel free to just reject it without any explanation).
Mitigating problems like this one:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x2e4c403]
goroutine 17643 [running]:
osf/contest/plugins/reporters/purgatory.(*Reporter).getRackSerial(0x0?, {0xc000a72f40?, 0x0?, 0x0?})
fbcode/osf/contest/plugins/reporters/purgatory/purgatory.go:272 +0x83
osf/contest/plugins/reporters/purgatory.(*Reporter).getFinalReport(0x312be36?, {0xc000a72f40?, 0x1, 0x1})
fbcode/osf/contest/plugins/reporters/purgatory/purgatory.go:336 +0x19d
osf/contest/plugins/reporters/purgatory.(*Reporter).FinalReport(0xc000253088, {0xb25ae0, 0xc001168730}, {0x5c5b00?, 0xc000e19200?}, {0xc000a72f40?, 0xc00144b7e0?, 0xc00144b7e0?}, {0xb17cc0, 0xc0010c2960})
fbcode/osf/contest/plugins/reporters/purgatory/purgatory.go:761 +0x18c
github.com/linuxboot/contest/pkg/runner.(*JobRunner).Run(0xc000270900, {0xb25ae0?, 0xc0011685a0}, 0xc000afdc20, 0x0)
fbcode/third-party-source/go/github.com/linuxboot/contest/pkg/runner/job_runner.go:261 +0x1d83
github.com/linuxboot/contest/pkg/jobmanager.(*JobManager).runJob(0xc0001e5d90, {0xb25ae0, 0xc001629310}, 0xc000afdc20, 0xc0007dcf01?)
fbcode/third-party-source/go/github.com/linuxboot/contest/pkg/jobmanager/start.go:110 +0x325
created by github.com/linuxboot/contest/pkg/jobmanager.(*JobManager).startJob
fbcode/third-party-source/go/github.com/linuxboot/contest/pkg/jobmanager/start.go:85 +0x290
If a single job fails the whole instance has no need to panic.
Codecov Report
Patch coverage: 100.00% and project coverage change: +0.02 :tada:
Comparison is base (
fa98f00) 61.77% compared to head (2ab1456) 61.80%.
:exclamation: Current head 2ab1456 differs from pull request most recent head dfc6f87. Consider uploading reports for the commit dfc6f87 to get more accurate results
:exclamation: Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.
Additional details and impacted files
@@ Coverage Diff @@
## develop #169 +/- ##
===========================================
+ Coverage 61.77% 61.80% +0.02%
===========================================
Files 131 131
Lines 9228 9234 +6
===========================================
+ Hits 5701 5707 +6
Misses 2855 2855
Partials 672 672
| Flag | Coverage Δ | |
|---|---|---|
| e2e | 49.71% <100.00%> (+0.04%) |
:arrow_up: |
| integration | 56.86% <100.00%> (+<0.01%) |
:arrow_up: |
| unittests | 46.03% <0.00%> (-0.15%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
| Impacted Files | Coverage Δ | |
|---|---|---|
| pkg/jobmanager/jobmanager.go | 77.41% <100.00%> (+0.24%) |
:arrow_up: |
| pkg/jobmanager/start.go | 76.85% <100.00%> (+0.89%) |
:arrow_up: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
what about the other possible failures? this now only handles the job start case, but other api events handling may fail. Am I reading this wrong?