Bug report criteria

[x] This bug report is not security related, security issues should be disclosed privately via [email protected].
[x] This is not a support request or question, support requests or questions should be raised in the etcd discussion forums.
[x] You have read the etcd bug reporting guidelines.
[x] Existing open issues along with etcd frequently asked questions have been checked and this is not a duplicate.

What happened?

ServerWatchStream can't be closed when client closed watcher

What did you expect to happen?

when client closed watcher ServerWatchStream should be closed

How can we reproduce it (as minimally and precisely as possible)?

using jetcd write the follow code and run while(1) { Watch.Watcher watcher = etcdClient.getWatchClient().watch(....); TimeUnit.SECONDS.sleep(5); watcher.close(); }

Anything else we need to know?

func (ws *watchServer) Watch(stream pb.Watch_WatchServer) (err error) { in this function, sws.recvLoop() return nil when recv EOF,so the line errc <- rerr will never be executed. go the below select will never be executed. this lead to serverWatchStream leak.

Etcd version (please run commands below)

$ etcd --version
# 3.5.17

$ etcdctl version
# 3.5.17

Etcd configuration (command line flags or environment variables)

paste your configuration here

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

$ etcdctl member list -w table
# paste output here

$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here

Relevant log output

Aug 14 '25 02:08 chaofengliu-okg

Yeah, although this PR seems fixed some issue, but I don't see any difference.

Adding a <-sws.closec: branch when a cancel watch request is received does not do any help.

select {
					case sws.ctrlStream <- wr:
					case <-sws.closec:
						return nil
					}

Aug 14 '25 02:08 exceptionplayer

any updates on this? This is a memory leak and may cause catastrophic problems. we are hoping this bug can be fixed.

Sep 02 '25 06:09 adamsau-okg

@serathius @apullo777 can you help and take a look at this issue?

Sep 02 '25 12:09 adamsau-okg

@ahrtr would you mind taking a look?

Sep 03 '25 03:09 adamsau-okg

Hi,

I'm new to contributing to etcd and have been studying the codebase. I believe I found a solution to the memory leak issue.

The memory leak occurs due to improper resource cleanup in serverWatchStream.Watch():

Race condition: When client cancels context quickly, the select statement chooses <-stream.Context().Done() before errc receives the error from recvLoop, causing incomplete cleanup.
EOF case: When recvLoop() returns nil (EOF), no signal is sent to errc, causing the select to hang indefinitely.

The fix involves three key changes:

Added synchronization channel to ensure recvLoop goroutine always completes
Fixed EOF case by sending io.EOF to errc when recvLoop() returns nil
Unified cleanup by always closing ctrlStream regardless of which select case triggers

func (ws *watchServer) Watch(stream pb.Watch_WatchServer) (err error) {
    // ... existing code ...
    
    errc := make(chan error, 1)
    done := make(chan struct{})  // NEW: synchronization channel
    
    go func() {
        defer close(done)  // NEW: signal completion
        if rerr := sws.recvLoop(); rerr != nil {
            // ... existing error handling ...
            errc <- rerr
        } else {
            // NEW: Handle EOF case to prevent hanging select
            errc <- io.EOF
        }
    }()
    
    select {
    case err = <-errc:
        if errors.Is(err, context.Canceled) {
            err = rpctypes.ErrGRPCWatchCanceled
        }
    case <-stream.Context().Done():
        err = stream.Context().Err()
        if errors.Is(err, context.Canceled) {
            err = rpctypes.ErrGRPCWatchCanceled
        }
        // NEW: Wait for recvLoop to complete to prevent resource leaks
        <-done
    }
    
    // NEW: Always close ctrlStream for proper cleanup
    close(sws.ctrlStream)
    sws.close()
    
    return err
}

Would appreciate feedback on this approach.

Sep 07 '25 21:09 Siomachkin

@Siomachkin thank you for your reply, we have thought about your "Fixed EOF case by sending io.EOF to errc when recvLoop() returns nil", we think it would be normal for the client to close the write part(half close) of the stream because if a client thinks that it does not need to write anything to the ectd server anymore it can close the write part of the stream but it may still wants to get notified when the key it is watching changes, so if the client close the write part, the etcd server may get an EOF, but the server should not close the stream.

One possible solution would be for the server to check if the stream has no more watchers being watched, then it can close the stream. But there is one thing to be considered, for the watch api, the grpc method is defined as a binary stream call, I was wondering if it is allowed for the client to send more then one watch requests to the etcd server on one stream, if it is allowed, then the server should do nothing, the client should be responsible for closing all the streams.

Sep 09 '25 05:09 exceptionplayer

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

Nov 09 '25 00:11 github-actions[bot]