gofail
gofail copied to clipboard
failpoint delete may block on "defer Release"
If failpoint is injected within for-loop, lock may never be released. And delete call would block.
https://github.com/coreos/gofail/pull/13#issuecomment-383972728
OK, defer is broken in general. e.g., this will starve the delete too:
for ctx.Err() == nil {
// gofail: var abc struct{}
// fmt.Println("abc")
whatever()
}
@heyitsanthony How about removing Release
completely, since terms have been already evaluated by the time Acquire
returns? Is there any other case that I am missing?
Before
raftDropHeartbeatLabel: /* gofail-label */
for {
m, err := dec.decode()
if err != nil {
cr.mu.Lock()
cr.close()
cr.mu.Unlock()
return err
}
if vraftDropHeartbeat, __fpErr := __fp_raftDropHeartbeat.Acquire(); __fpErr == nil {
defer __fp_raftDropHeartbeat.Release()
_, __fpTypeOK := vraftDropHeartbeat.(struct{})
if !__fpTypeOK {
goto __badTyperaftDropHeartbeat
}
continue raftDropHeartbeatLabel
__badTyperaftDropHeartbeat:
__fp_raftDropHeartbeat.BadType(vraftDropHeartbeat, "struct{}")
}
After
raftDropHeartbeatLabel: /* gofail-label */
for {
m, err := dec.decode()
if err != nil {
cr.mu.Lock()
cr.close()
cr.mu.Unlock()
return err
}
if vraftDropHeartbeat, __fpErr := __fp_raftDropHeartbeat.Acquire(); __fpErr == nil {
- defer __fp_raftDropHeartbeat.Release()
_, __fpTypeOK := vraftDropHeartbeat.(struct{})
if !__fpTypeOK {
goto __badTyperaftDropHeartbeat
}
continue raftDropHeartbeatLabel
__badTyperaftDropHeartbeat:
__fp_raftDropHeartbeat.BadType(vraftDropHeartbeat, "struct{}")
}
@gyuho it's important for current delete semantics:
// gofail: var crash struct{}
// panic("crash")
if the lock isn't held for the duration of the fp, then it's possible an inflight fp will crash the program after the delete is ack'd. One option is to have something like // gofail-go:
to indicate that a delete will prevent new fp's from being issued, but has no control over whether any are inflight. This would also get around delete starvation for fp's that wait:
func a() {
ctx, cancel := context.WithTimeout(context.Background(), 10 * time.Second)
go f(ctx)
}
func f(ctx context.Context) {
// gofail-go: var waitForDone struct{}
// <-ctx.Done()
...
}
Eventually I removed the lock for each failpoint.
When the server(runtime) processes a http request, it acquires the global write lock. This prevents all failpoints from being triggered. It ensures the server(runtime) doesn't panic due to any failpoints during processing the HTTP request.
It may be inefficient, but correctness is more important than efficiency. Usually users will not enable too many failpoints at a time, so it (the efficiency) isn't a problem.
https://github.com/etcd-io/gofail/pull/38
Fully agree with the decision.
Resolved in https://github.com/etcd-io/gofail/pull/38