grpc-go icon indicating copy to clipboard operation
grpc-go copied to clipboard

Allow waiting for all goroutines to exit on client connection close

Open twz123 opened this issue 2 months ago • 4 comments

Use case(s) - what problem will this feature solve?

When a client connection closes, the goroutines associated with that connection are cancelled. However, Close() returns before all of those goroutines have actually exited.

This usually isn't a problem in long-lived applications, but in short-lived clients, tests, or graceful shutdown paths, the asynchronous cleanup can cause issues. Those goroutines may still perform observable side effects after Close() returns, making it impossible to know when all background work has finished. This leads to unpredictable behavior in situations where deterministic teardown matters.

A common example is shutting down the logging subsystem: background tasks may still emit log messages after Close() returns, racing with the logging shutdown. Similarly, test scenarios that verify proper resource cleanup can become unreliable.

Providing a way to wait until all internal goroutines have exited would make shutdowns more predictable and simplify synchronization in these cases.

Proposed Solution

Provide a mechanism that blocks until all goroutines spawned by a connection have completed their cleanup and exited. I'm not proposing a specific API - whatever fits best.

Internally, this could be implemented by using a sync.WaitGroup or a similar mechanism to track active goroutines that aren't already tracked elsewhere. For backward compatibility, the existing Close() method should remain "non-blocking".

I've also considered making this behavior unconditional for Close(). Although this would be the most intuitive approach for me, it's probably not feasible because it would affect the shutdown timing and blocking behavior of many applications that rely on the current behavior?

Alternatives Considered

Live with the current situation and build workarounds to mitigate this in scenarios where it's required. However, none of those workarounds will be bullet-proof. Things like log statements about connections that have already been closed can't be worked around.

Additional Context

I've identified three goroutines which will currently outlive the call to Close() so far:

twz123 avatar Oct 16 '25 09:10 twz123

ClientConn.Close does its best to shutdown everything before returning. See: https://github.com/grpc/grpc-go/blob/b8a0fc99579fe3ae9643ab9fc3bad8e7f92077cc/clientconn.go#L1135

But, as you rightly point out, if one of those sub-components spawns a goroutine but does not wait for the goroutine to complete before returning from its shutdown method, there will be some goroutines left hanging.

We do have a leakchecker that we run as part of teardown in all of our tests that waits for a preconfigured amount of time and ensures that there are no leaked goroutines. See: https://github.com/grpc/grpc-go/blob/b8a0fc99579fe3ae9643ab9fc3bad8e7f92077cc/internal/grpctest/grpctest.go#L66

The ones that you've identified in your description do seem valid, but they seem to be easily fixable in those individual components. We'd be happy to review PRs if you'd like to make those changes.

I'm also curious to know if you are running into specific problems because of these leaked goroutines.

Thanks

easwars avatar Oct 16 '25 23:10 easwars

We'd be happy to review PRs if you'd like to make those changes.

I have an experimental branch for this, indeed.

I'm also curious to know if you are running into specific problems because of these leaked goroutines.

Yes, as stated in the issue description, log statements are being issued after close, which will race with the logging subsystem shutdown during graceful application shutdown. Also it's very strange to see logs being issued even after a connection has been closed in general.

twz123 avatar Oct 17 '25 05:10 twz123

Yes, as stated in the issue description, log statements are being issued after close, which will race with the logging subsystem shutdown during graceful application shutdown

Do you have a concrete repro for this? The go test command will fail the test if it emits a log statement after the test is marked as complete.

easwars avatar Oct 27 '25 17:10 easwars

I can repro this almost instantly with this Go program (shells out to docker in order to run etcd):

main.go
// SPDX-FileCopyrightText: 2025 Tom Wieczorek
// SPDX-License-Identifier: MIT

// Invoke as follows:
//
//	go build -o grpc-go-issue-8655 main.go && while ./grpc-go-issue-8655; do printf .; done
//
// Use the following as go.mod (and run `go mod tidy`):
//
//	module example.com/twz123/grpc-go-issue-8655
//
//	go 1.25.0
//
//	require (
//		go.etcd.io/etcd/client/pkg/v3 v3.6.5
//		go.etcd.io/etcd/client/v3 v3.6.5
//		google.golang.org/grpc v1.76.0
//	)
package main

import (
	"bytes"
	"context"
	"errors"
	"fmt"
	"io"
	"net/http"
	"os"
	"os/exec"
	"os/signal"
	"runtime/pprof"
	"sync"
	"sync/atomic"
	"time"

	client "go.etcd.io/etcd/client/v3"
	"google.golang.org/grpc/grpclog"
)

func main() {
	var buf syncWriter
	if err := run(&buf); err != nil {
		buf.WriteTo(os.Stderr)
		fmt.Fprintln(os.Stderr, "Error:", err)
		os.Exit(1)
	}
}

func run(buf io.Writer) (err error) {
	ctx, cancel := context.WithCancelCause(context.Background())
	signals := make(chan os.Signal, 1)
	signal.Notify(signals, os.Interrupt)
	go func() {
		defer signal.Stop(signals)
		select {
		case signal := <-signals:
			cancel(errors.New(signal.String()))
		case <-ctx.Done():
		}
	}()

	etcd, err := startEtcd(ctx, buf, buf)
	// etcd, err := startEtcd(ctx, io.Discard, io.Discard)
	if err != nil {
		return err
	}

	defer func() {
		cancel(nil)
		<-etcd.done
	}()

	go func() {
		select {
		case <-etcd.done:
			cancel(fmt.Errorf("etcd exited: %w", etcd.err))
		case <-ctx.Done():
		}
	}()

	logger := grpcLogger{
		l: grpclog.NewLoggerV2(buf, buf, buf),
		logAfterClose: func(a any) {
			fmt.Fprintln(buf, "Log after close:", a)
			pprof.Lookup("goroutine").WriteTo(buf, 2)
			cancel(errors.New("log after close"))
		},
	}
	grpclog.SetLoggerV2(&logger)

	if err := waitForEtcd(ctx); err != nil {
		return err
	}

	cfg := client.Config{
		Context:   ctx,
		Endpoints: []string{"http://127.0.0.1:2379"},
	}

	var wg sync.WaitGroup
	var connCount atomic.Int32
	for range 10 {
		wg.Add(1)
		go func() {
			defer wg.Done()
			cli, err := client.New(cfg)
			if err != nil {
				cancel(err)
				return
			}
			if err := cli.Close(); err != nil {
				cancel(err)
				return
			}
			fmt.Fprintln(buf, "Done ", connCount.Add(1))
		}()
	}

	wg.Wait()
	logger.closed.Store(true)

	select {
	case <-ctx.Done():
		return context.Cause(ctx)
	case <-time.After(100 * time.Millisecond):
		return nil
	}
}

type etcd struct {
	done <-chan struct{}
	err  error
}

func startEtcd(ctx context.Context, stdout, stderr io.Writer) (_ *etcd, err error) {
	// Start etcd
	cmd := exec.CommandContext(ctx, "docker", "run", "--rm",
		"-p", "2379:2379", "-p", "2381:2381",
		"--tmpfs", "/etcd-data:rw,size=128m,mode=0700",
		"gcr.io/etcd-development/etcd:v3.6.5",
		"/usr/local/bin/etcd",
		"--name", "grpc-go-issue-8655",
		"--initial-cluster-state", "new",
		"--data-dir", "/etcd-data",
		"--advertise-client-urls", "http://127.0.0.1:2379",
		"--listen-client-urls", "http://0.0.0.0:2379",
		"--listen-client-http-urls", "http://0.0.0.0:2381",
		"--log-level", "warn",
	)
	cmd.Stdout, cmd.Stderr = stdout, stderr
	cmd.Cancel = func() error {
		return cmd.Process.Signal(os.Interrupt)
	}
	cmd.WaitDelay = 10 * time.Second

	done := make(chan struct{})
	etcd := etcd{
		done: done,
		err:  errors.New("etcd didn't exit"),
	}

	if err := cmd.Start(); err != nil {
		return nil, fmt.Errorf("failed to start etcd: %w", err)
	}
	go func() { defer close(done); etcd.err = cmd.Wait() }()
	return &etcd, nil
}

func waitForEtcd(ctx context.Context) error {
	httpc := &http.Client{
		Transport: &http.Transport{
			DisableKeepAlives: true,
		},
	}

	ctx, cancel := context.WithTimeoutCause(ctx, 10*time.Second, errors.New("etcd did not become healthy in time"))
	defer cancel()

	req, err := http.NewRequestWithContext(ctx, "GET", "http://127.0.0.1:2381/health", nil)
	if err != nil {
		return err
	}

	for {
		resp, err := httpc.Do(req)
		if err == nil {
			body, _ := io.ReadAll(resp.Body)
			resp.Body.Close()
			if resp.StatusCode == 200 && bytes.Contains(body, []byte(`"health":"true"`)) {
				return nil
			}

			err = fmt.Errorf("%d: %s", resp.StatusCode, body)
		}

		select {
		case <-time.After(200 * time.Millisecond):
		case <-ctx.Done():
			return fmt.Errorf("%w (%w)", context.Cause(ctx), err)
		}
	}
}

type syncWriter struct {
	buf bytes.Buffer
	mu  sync.Mutex
}

func (s *syncWriter) Write(p []byte) (n int, err error) {
	s.mu.Lock()
	defer s.mu.Unlock()
	return s.buf.Write(p)
}

func (s *syncWriter) WriteTo(w io.Writer) (int64, error) {
	s.mu.Lock()
	defer s.mu.Unlock()
	return s.buf.WriteTo(w)
}

type grpcLogger struct {
	l             grpclog.LoggerV2
	closed        atomic.Bool
	logAfterClose func(any)
}

func (l *grpcLogger) Error(args ...any) {
	if l.closed.Load() {
		l.logAfterClose(fmt.Sprint(args...))
	}
	l.l.Error(args...)
}

func (l *grpcLogger) Errorf(format string, args ...any) {
	if l.closed.Load() {
		l.logAfterClose(fmt.Sprintf(format, args...))
	}
	l.l.Errorf(format, args...)
}

func (l *grpcLogger) Errorln(args ...any) {
	if l.closed.Load() {
		l.logAfterClose(fmt.Sprint(args...))
	}
	l.l.Errorln(args...)
}

func (l *grpcLogger) Fatal(args ...any) {
	if l.closed.Load() {
		l.logAfterClose(fmt.Sprint(args...))
	}
	l.l.Fatal(args...)
}

func (l *grpcLogger) Fatalf(format string, args ...any) {
	if l.closed.Load() {
		l.logAfterClose(fmt.Sprintf(format, args...))
	} else {
		l.l.Fatalf(format, args...)
	}
}

func (l *grpcLogger) Fatalln(args ...any) {
	if l.closed.Load() {
		l.logAfterClose(fmt.Sprint(args...))
	} else {
		l.l.Fatalln(args...)
	}
}

func (l *grpcLogger) Info(args ...any) {
	// if l.closed.Load() {
	// 	l.logAfterClose(fmt.Sprint(args...))
	// } else {
	// 	l.l.Info(args...)
	// }
}

func (l *grpcLogger) Infof(format string, args ...any) {
	// if l.closed.Load() {
	// 	l.logAfterClose(fmt.Sprintf(format, args...))
	// } else {
	// 	l.l.Infof(format, args...)
	// }
}

func (l *grpcLogger) Infoln(args ...any) {
	// if l.closed.Load() {
	// 	l.logAfterClose(fmt.Sprint(args...))
	// } else {
	// 	l.l.Infoln(args...)
	// }
}

func (l *grpcLogger) Warning(args ...any) {
	if l.closed.Load() {
		l.logAfterClose(fmt.Sprint(args...))
	} else {
		l.l.Warning(args...)
	}
}

func (l *grpcLogger) Warningf(format string, args ...any) {
	if l.closed.Load() {
		l.logAfterClose(fmt.Sprintf(format, args...))
	} else {
		l.l.Warningf(format, args...)
	}
}
func (l *grpcLogger) Warningln(args ...any) {
	if l.closed.Load() {
		l.logAfterClose(fmt.Sprintln(args...))
	} else {
		l.l.Warningln(args...)
	}
}

func (*grpcLogger) V(l int) bool {
	// return true
	return false
}

The problem disappeas when switching grpc-go to e45e9d78.

This is an example output on my machine:

stderr
{"level":"warn","ts":"2025-10-29T17:45:51.655751Z","caller":"auth/store.go:1135","msg":"simple token is not cryptographically signed"}
2025/10/29 18:45:52 WARNING: [core] [Channel #1 SubChannel #4] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:2379: operation was canceled"
2025/10/29 18:45:52 WARNING: [core] [Channel #1 SubChannel #4] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:2379: operation was canceled"
2025/10/29 18:45:52 WARNING: [core] [Channel #2 SubChannel #8] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:2379: operation was canceled"
2025/10/29 18:45:52 WARNING: [core] [Channel #2 SubChannel #8] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:2379: operation was canceled"
2025/10/29 18:45:52 WARNING: [core] [Channel #3 SubChannel #6] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: write tcp 127.0.0.1:56966->127.0.0.1:2379: use of closed network connection
2025/10/29 18:45:52 WARNING: [core] [Channel #3 SubChannel #6] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: write tcp 127.0.0.1:56966->127.0.0.1:2379: use of closed network connection
2025/10/29 18:45:52 WARNING: [core] [Channel #10 SubChannel #13] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:2379: operation was canceled"
2025/10/29 18:45:52 WARNING: [core] [Channel #10 SubChannel #13] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:2379: operation was canceled"
Done  1
Done  2
Done  3
Done  4
Done  5
2025/10/29 18:45:52 WARNING: [core] [Channel #17 SubChannel #25] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:2379: operation was canceled"
2025/10/29 18:45:52 WARNING: [core] [Channel #17 SubChannel #25] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:2379: operation was canceled"
Done  6
2025/10/29 18:45:52 WARNING: [core] [Channel #18 SubChannel #22] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:2379: operation was canceled"
2025/10/29 18:45:52 WARNING: [core] [Channel #18 SubChannel #22] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:2379: operation was canceled"
2025/10/29 18:45:52 WARNING: [core] [Channel #17 SubChannel #24] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: write tcp 127.0.0.1:57014->127.0.0.1:2379: use of closed network connection
2025/10/29 18:45:52 WARNING: [core] [Channel #17 SubChannel #24] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: write tcp 127.0.0.1:57014->127.0.0.1:2379: use of closed network connection
2025/10/29 18:45:52 WARNING: [core] [Channel #18 SubChannel #21] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: failed to write client preface: write tcp 127.0.0.1:57008->127.0.0.1:2379: use of closed network connection"
2025/10/29 18:45:52 WARNING: [core] [Channel #18 SubChannel #21] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: failed to write client preface: write tcp 127.0.0.1:57008->127.0.0.1:2379: use of closed network connection"
2025/10/29 18:45:52 WARNING: [core] [Channel #26 SubChannel #32] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:2379: operation was canceled"
2025/10/29 18:45:52 WARNING: [core] [Channel #26 SubChannel #32] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:2379: operation was canceled"
Done  7
Done  8
2025/10/29 18:45:52 WARNING: [core] [Channel #20 SubChannel #29] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: failed to write client preface: write tcp 127.0.0.1:57054->127.0.0.1:2379: use of closed network connection"
2025/10/29 18:45:52 WARNING: [core] [Channel #20 SubChannel #29] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: failed to write client preface: write tcp 127.0.0.1:57054->127.0.0.1:2379: use of closed network connection"
2025/10/29 18:45:52 WARNING: [core] [Channel #16 SubChannel #38] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:2379: operation was canceled"
2025/10/29 18:45:52 WARNING: [core] [Channel #16 SubChannel #38] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:2379: operation was canceled"
Done  9
2025/10/29 18:45:52 WARNING: [core] [Channel #16 SubChannel #28] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "error reading server preface: read tcp 127.0.0.1:57052->127.0.0.1:2379: use of closed network connection"
2025/10/29 18:45:52 WARNING: [core] [Channel #16 SubChannel #28] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "error reading server preface: read tcp 127.0.0.1:57052->127.0.0.1:2379: use of closed network connection"
2025/10/29 18:45:52 WARNING: [core] [Channel #26 SubChannel #33] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: failed to write client preface: write tcp 127.0.0.1:57064->127.0.0.1:2379: use of closed network connection"
2025/10/29 18:45:52 WARNING: [core] [Channel #26 SubChannel #33] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: failed to write client preface: write tcp 127.0.0.1:57064->127.0.0.1:2379: use of closed network connection"
2025/10/29 18:45:52 WARNING: [core] [Channel #20 SubChannel #27] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: write tcp 127.0.0.1:57044->127.0.0.1:2379: use of closed network connection
2025/10/29 18:45:52 WARNING: [core] [Channel #20 SubChannel #27] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: write tcp 127.0.0.1:57044->127.0.0.1:2379: use of closed network connection
2025/10/29 18:45:52 WARNING: [core] [Channel #36 SubChannel #40] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:2379: operation was canceled"
2025/10/29 18:45:52 WARNING: [core] [Channel #36 SubChannel #40] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:2379: operation was canceled"
Done  10
Log after close: [core] [Channel #36 SubChannel #41] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2379", ServerName: "127.0.0.1:2379", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:2379: operation was canceled"

goroutine 158 [running]:
runtime/pprof.writeGoroutineStacks({0xd8afe0, 0xc00038c180})
	/nix/store/k1kn1c59ss7nq6agdppzq3krwmi3xqy4-go-1.25.2/share/go/src/runtime/pprof/pprof.go:756 +0x6b
runtime/pprof.writeGoroutine({0xd8afe0?, 0xc00038c180?}, 0x0?)
	/nix/store/k1kn1c59ss7nq6agdppzq3krwmi3xqy4-go-1.25.2/share/go/src/runtime/pprof/pprof.go:745 +0x25
runtime/pprof.(*Profile).WriteTo(0xc8aaba?, {0xd8afe0?, 0xc00038c180?}, 0x2?)
	/nix/store/k1kn1c59ss7nq6agdppzq3krwmi3xqy4-go-1.25.2/share/go/src/runtime/pprof/pprof.go:371 +0x14b
main.run.func4({0xb3caa0?, 0xc000f04190?})
	/home/tom/grpc-go-issue-8655/main.go:85 +0x99
main.(*grpcLogger).Warningln(0xc0001300a0, {0xc000f16060?, 0x2?, 0xb3d0e0?})
	/home/tom/grpc-go-issue-8655/main.go:312 +0x4a
google.golang.org/grpc/grpclog.WarningDepth(0xc000f04170?, {0xc000f16060?, 0x10?, 0xc86773?})
	/home/tom/.cache/go/pkg/mod/google.golang.org/[email protected]/grpclog/grpclog.go:155 +0x56
google.golang.org/grpc/grpclog.(*componentData).WarningDepth(0xc0000362a0, 0x2, {0xc000f04150, 0x1, 0x2?})
	/home/tom/.cache/go/pkg/mod/google.golang.org/[email protected]/grpclog/component.go:39 +0x137
google.golang.org/grpc/internal/channelz.AddTraceEvent({0xd9be00, 0xc0000362a0}, {0xd91678, 0xc0001e9100}, 0x1, 0xc000f0d960)
	/home/tom/.cache/go/pkg/mod/google.golang.org/[email protected]/internal/channelz/trace.go:202 +0x1d4
google.golang.org/grpc/internal/channelz.Warningf({0xd9be00, 0xc0000362a0}, {0xd91678, 0xc0001e9100}, {0xcb6a81?, 0x0?}, {0xc000f0db98?, 0x0?, 0xc8e8d0?})
	/home/tom/.cache/go/pkg/mod/google.golang.org/[email protected]/internal/channelz/logging.go:55 +0x92
google.golang.org/grpc.(*addrConn).createTransport(_, {_, _}, {{0xc93ad5, 0xe}, {0xc93ad5, 0xe}, 0x0, 0x0, {0x0, ...}}, ...)
	/home/tom/.cache/go/pkg/mod/google.golang.org/[email protected]/clientconn.go:1407 +0x765
google.golang.org/grpc.(*addrConn).tryAllAddrs(0xc0007ec608, {0xd93dd0, 0xc0007545f0}, {0xc0007d03c0?, 0xc000498528?, 0xc00046dea8?}, {0x505f13?, 0xc000498500?, 0x1327d40?})
	/home/tom/.cache/go/pkg/mod/google.golang.org/[email protected]/clientconn.go:1347 +0x468
google.golang.org/grpc.(*addrConn).resetTransportAndUnlock(0xc0007ec608)
	/home/tom/.cache/go/pkg/mod/google.golang.org/[email protected]/clientconn.go:1279 +0x1c5
google.golang.org/grpc.(*addrConn).connect(0xc0007ec608)
	/home/tom/.cache/go/pkg/mod/google.golang.org/[email protected]/clientconn.go:942 +0x13a
created by google.golang.org/grpc.(*acBalancerWrapper).Connect in goroutine 134
	/home/tom/.cache/go/pkg/mod/google.golang.org/[email protected]/balancer_wrapper.go:354 +0x56

goroutine 1 [select]:
main.run({0xd8afe0, 0xc00038c180})
	/home/tom/grpc-go-issue-8655/main.go:122 +0x756
main.main()
	/home/tom/grpc-go-issue-8655/main.go:42 +0x32

goroutine 18 [syscall]:
os/signal.signal_recv()
	/nix/store/k1kn1c59ss7nq6agdppzq3krwmi3xqy4-go-1.25.2/share/go/src/runtime/sigqueue.go:152 +0x29
os/signal.loop()
	/nix/store/k1kn1c59ss7nq6agdppzq3krwmi3xqy4-go-1.25.2/share/go/src/os/signal/signal_unix.go:23 +0x13
created by os/signal.Notify.func1.1 in goroutine 1
	/nix/store/k1kn1c59ss7nq6agdppzq3krwmi3xqy4-go-1.25.2/share/go/src/os/signal/signal.go:152 +0x1f

goroutine 19 [select]:
main.run.func1()
	/home/tom/grpc-go-issue-8655/main.go:55 +0xc7
created by main.run in goroutine 1
	/home/tom/grpc-go-issue-8655/main.go:53 +0x131

goroutine 20 [IO wait]:
internal/poll.runtime_pollWait(0x7f9b15221e00, 0x72)
	/nix/store/k1kn1c59ss7nq6agdppzq3krwmi3xqy4-go-1.25.2/share/go/src/runtime/netpoll.go:351 +0x85
internal/poll.(*pollDesc).wait(0xc00012e060?, 0xc00041a000?, 0x1)
	/nix/store/k1kn1c59ss7nq6agdppzq3krwmi3xqy4-go-1.25.2/share/go/src/internal/poll/fd_poll_runtime.go:84 +0x27
internal/poll.(*pollDesc).waitRead(...)
	/nix/store/k1kn1c59ss7nq6agdppzq3krwmi3xqy4-go-1.25.2/share/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc00012e060, {0xc00041a000, 0x8000, 0x8000})
	/nix/store/k1kn1c59ss7nq6agdppzq3krwmi3xqy4-go-1.25.2/share/go/src/internal/poll/fd_unix.go:165 +0x279
os.(*File).read(...)
	/nix/store/k1kn1c59ss7nq6agdppzq3krwmi3xqy4-go-1.25.2/share/go/src/os/file_posix.go:29
os.(*File).Read(0xc00012c008, {0xc00041a000?, 0x47e1bf?, 0x8?})
	/nix/store/k1kn1c59ss7nq6agdppzq3krwmi3xqy4-go-1.25.2/share/go/src/os/file.go:144 +0x4f
io.copyBuffer({0xd8afe0, 0xc00038c180}, {0xd8b8a0, 0xc000418000}, {0x0, 0x0, 0x0})
	/nix/store/k1kn1c59ss7nq6agdppzq3krwmi3xqy4-go-1.25.2/share/go/src/io/io.go:429 +0x190
io.Copy(...)
	/nix/store/k1kn1c59ss7nq6agdppzq3krwmi3xqy4-go-1.25.2/share/go/src/io/io.go:388
os.genericWriteTo(0xc00012c008?, {0xd8afe0, 0xc00038c180})
	/nix/store/k1kn1c59ss7nq6agdppzq3krwmi3xqy4-go-1.25.2/share/go/src/os/file.go:295 +0x4f
os.(*File).WriteTo(0xc00012c008, {0xd8afe0, 0xc00038c180})
	/nix/store/k1kn1c59ss7nq6agdppzq3krwmi3xqy4-go-1.25.2/share/go/src/os/file.go:273 +0x9c
io.copyBuffer({0xd8afe0, 0xc00038c180}, {0xd8b0a0, 0xc00012c008}, {0x0, 0x0, 0x0})
	/nix/store/k1kn1c59ss7nq6agdppzq3krwmi3xqy4-go-1.25.2/share/go/src/io/io.go:411 +0x9d
io.Copy(...)
	/nix/store/k1kn1c59ss7nq6agdppzq3krwmi3xqy4-go-1.25.2/share/go/src/io/io.go:388
os/exec.(*Cmd).writerDescriptor.func1()
	/nix/store/k1kn1c59ss7nq6agdppzq3krwmi3xqy4-go-1.25.2/share/go/src/os/exec/exec.go:596 +0x34
os/exec.(*Cmd).Start.func2(0x0?)
	/nix/store/k1kn1c59ss7nq6agdppzq3krwmi3xqy4-go-1.25.2/share/go/src/os/exec/exec.go:749 +0x2c
created by os/exec.(*Cmd).Start in goroutine 1
	/nix/store/k1kn1c59ss7nq6agdppzq3krwmi3xqy4-go-1.25.2/share/go/src/os/exec/exec.go:748 +0x935

goroutine 21 [select]:
os/exec.(*Cmd).watchCtx(0xc000116000, 0xc000126070)
	/nix/store/k1kn1c59ss7nq6agdppzq3krwmi3xqy4-go-1.25.2/share/go/src/os/exec/exec.go:789 +0xb2
created by os/exec.(*Cmd).Start in goroutine 1
	/nix/store/k1kn1c59ss7nq6agdppzq3krwmi3xqy4-go-1.25.2/share/go/src/os/exec/exec.go:775 +0x8fd

goroutine 22 [syscall]:
syscall.Syscall6(0xf7, 0x3, 0xc, 0xc0000ba648, 0x4, 0xc000422000, 0x0)
	/nix/store/k1kn1c59ss7nq6agdppzq3krwmi3xqy4-go-1.25.2/share/go/src/syscall/syscall_linux.go:96 +0x39
internal/syscall/unix.Waitid(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/nix/store/k1kn1c59ss7nq6agdppzq3krwmi3xqy4-go-1.25.2/share/go/src/internal/syscall/unix/waitid_linux.go:18 +0x39
os.(*Process).pidfdWait.func1(...)
	/nix/store/k1kn1c59ss7nq6agdppzq3krwmi3xqy4-go-1.25.2/share/go/src/os/pidfd_linux.go:109
os.ignoringEINTR(...)
	/nix/store/k1kn1c59ss7nq6agdppzq3krwmi3xqy4-go-1.25.2/share/go/src/os/file_posix.go:256
os.(*Process).pidfdWait(0xc000156000)
	/nix/store/k1kn1c59ss7nq6agdppzq3krwmi3xqy4-go-1.25.2/share/go/src/os/pidfd_linux.go:108 +0x1a9
os.(*Process).wait(0x0?)
	/nix/store/k1kn1c59ss7nq6agdppzq3krwmi3xqy4-go-1.25.2/share/go/src/os/exec_unix.go:25 +0x1a
os.(*Process).Wait(...)
	/nix/store/k1kn1c59ss7nq6agdppzq3krwmi3xqy4-go-1.25.2/share/go/src/os/exec.go:340
os/exec.(*Cmd).Wait(0xc000116000)
	/nix/store/k1kn1c59ss7nq6agdppzq3krwmi3xqy4-go-1.25.2/share/go/src/os/exec/exec.go:922 +0x45
main.startEtcd.func2()
	/home/tom/grpc-go-issue-8655/main.go:165 +0x54
created by main.startEtcd in goroutine 1
	/home/tom/grpc-go-issue-8655/main.go:165 +0x2a9

goroutine 23 [select]:
main.run.func3()
	/home/tom/grpc-go-issue-8655/main.go:74 +0x78
created by main.run in goroutine 1
	/home/tom/grpc-go-issue-8655/main.go:73 +0x225
{"level":"error","ts":"2025-10-29T17:45:52.480891Z","caller":"embed/etcd.go:912","msg":"setting up serving from embedded etcd failed.","error":"http: Server closed","stacktrace":"go.etcd.io/etcd/server/v3/embed.(*Etcd).errHandler\n\tgo.etcd.io/etcd/server/v3/embed/etcd.go:912\ngo.etcd.io/etcd/server/v3/embed.(*serveCtx).startHandler.func1\n\tgo.etcd.io/etcd/server/v3/embed/serve.go:90"}
{"level":"error","ts":"2025-10-29T17:45:52.480981Z","caller":"embed/etcd.go:912","msg":"setting up serving from embedded etcd failed.","error":"accept tcp [::]:2381: use of closed network connection","stacktrace":"go.etcd.io/etcd/server/v3/embed.(*Etcd).errHandler\n\tgo.etcd.io/etcd/server/v3/embed/etcd.go:912\ngo.etcd.io/etcd/server/v3/embed.(*Etcd).startHandler.func1\n\tgo.etcd.io/etcd/server/v3/embed/etcd.go:906"}
{"level":"error","ts":"2025-10-29T17:45:52.487282Z","caller":"embed/etcd.go:912","msg":"setting up serving from embedded etcd failed.","error":"accept tcp 127.0.0.1:2380: use of closed network connection","stacktrace":"go.etcd.io/etcd/server/v3/embed.(*Etcd).errHandler\n\tgo.etcd.io/etcd/server/v3/embed/etcd.go:912\ngo.etcd.io/etcd/server/v3/embed.(*Etcd).startHandler.func1\n\tgo.etcd.io/etcd/server/v3/embed/etcd.go:906"}
Error: log after close

The above logs also showcase a lot of (IMO inappropriate) warnings as mentioned in:

  • #8654

twz123 avatar Oct 29 '25 17:10 twz123