gofail icon indicating copy to clipboard operation
gofail copied to clipboard

failpoint delete may block on "defer Release"

Open gyuho opened this issue 6 years ago • 2 comments

If failpoint is injected within for-loop, lock may never be released. And delete call would block.

https://github.com/coreos/gofail/pull/13#issuecomment-383972728

OK, defer is broken in general. e.g., this will starve the delete too:

for ctx.Err() == nil {
  // gofail: var abc struct{}
  // fmt.Println("abc")
  whatever()
}

gyuho avatar Apr 24 '18 16:04 gyuho

@heyitsanthony How about removing Release completely, since terms have been already evaluated by the time Acquire returns? Is there any other case that I am missing?

Before

raftDropHeartbeatLabel: /* gofail-label */
	for {
		m, err := dec.decode()
		if err != nil {
			cr.mu.Lock()
			cr.close()
			cr.mu.Unlock()
			return err
		}

		if vraftDropHeartbeat, __fpErr := __fp_raftDropHeartbeat.Acquire(); __fpErr == nil {
			defer __fp_raftDropHeartbeat.Release()
			_, __fpTypeOK := vraftDropHeartbeat.(struct{})
			if !__fpTypeOK {
				goto __badTyperaftDropHeartbeat
			}
			continue raftDropHeartbeatLabel
		__badTyperaftDropHeartbeat:
			__fp_raftDropHeartbeat.BadType(vraftDropHeartbeat, "struct{}")
		}

After

raftDropHeartbeatLabel: /* gofail-label */
	for {
		m, err := dec.decode()
		if err != nil {
			cr.mu.Lock()
			cr.close()
			cr.mu.Unlock()
			return err
		}

		if vraftDropHeartbeat, __fpErr := __fp_raftDropHeartbeat.Acquire(); __fpErr == nil {
-			defer __fp_raftDropHeartbeat.Release()
			_, __fpTypeOK := vraftDropHeartbeat.(struct{})
			if !__fpTypeOK {
				goto __badTyperaftDropHeartbeat
			}
			continue raftDropHeartbeatLabel
		__badTyperaftDropHeartbeat:
			__fp_raftDropHeartbeat.BadType(vraftDropHeartbeat, "struct{}")
		}

gyuho avatar Apr 25 '18 16:04 gyuho

@gyuho it's important for current delete semantics:

// gofail: var crash struct{}
// panic("crash")

if the lock isn't held for the duration of the fp, then it's possible an inflight fp will crash the program after the delete is ack'd. One option is to have something like // gofail-go: to indicate that a delete will prevent new fp's from being issued, but has no control over whether any are inflight. This would also get around delete starvation for fp's that wait:

func a() {
    ctx, cancel := context.WithTimeout(context.Background(), 10 * time.Second)
    go f(ctx)
}

func f(ctx context.Context) {
// gofail-go: var waitForDone struct{}
// <-ctx.Done()
...
}

heyitsanthony avatar Apr 25 '18 17:04 heyitsanthony

Eventually I removed the lock for each failpoint.

When the server(runtime) processes a http request, it acquires the global write lock. This prevents all failpoints from being triggered. It ensures the server(runtime) doesn't panic due to any failpoints during processing the HTTP request.

It may be inefficient, but correctness is more important than efficiency. Usually users will not enable too many failpoints at a time, so it (the efficiency) isn't a problem.

https://github.com/etcd-io/gofail/pull/38

ahrtr avatar Nov 24 '22 21:11 ahrtr

Fully agree with the decision.

serathius avatar Nov 25 '22 08:11 serathius

Resolved in https://github.com/etcd-io/gofail/pull/38

ahrtr avatar Nov 25 '22 11:11 ahrtr