go-dockerclient
go-dockerclient copied to clipboard
RemoveEventListener cause connection leak
Recently both our app and docker engine found fd count increase in a short time, after looking into the go-dockerclient source code and test, we find the AddEventListener and RemoveEventListener cause the problem: RemoveEventListener will not close connection, but after remove all listener and call AddEventListener again, the client will create a new connection.
After looking into the code, we find that the eventHijack
create the gorountine and will not return when we call RemoveEventListener :
https://github.com/fsouza/go-dockerclient/blob/main/event.go#L383
Now the newest go-dockerclient version has this problem.
The test:
package main
import (
"fmt"
"io/ioutil"
"log"
"os"
"time"
docker "github.com/fsouza/go-dockerclient"
)
func main() {
fmt.Println("Start running daemon")
//Init Docker client
endpoint := "unix:///var/run/docker.sock"
if len(os.Args) < 3 {
fmt.Printf(`Usage %s [unix-socket-path] [fd-path]
example: %s unix:///var/run/docker.sock /proc/$(pidof docker)/fd`, os.Args[0], os.Args[0])
os.Exit(2)
}
endpoint = os.Args[1]
fdDir := os.Args[2]
client, err := docker.NewClient(endpoint)
if err != nil {
log.Fatal(err)
}
for {
connect(client, endpoint, fdDir)
}
}
func connect(client *docker.Client, endpoint, fdDir string) {
arr, err := ioutil.ReadDir(fdDir)
if err != nil {
log.Fatalf("read fd dir error %s: %v", fdDir, err)
}
fmt.Println(time.Now().Format("Mon Jan 02 15:04:05"), "before connect fd count:", len(arr))
events := make(chan *docker.APIEvents)
client.AddEventListener(events)
defer func() {
client.RemoveEventListener(events)
}()
timeout := time.After(3 * time.Second)
select {
case msg := <-events:
if msg.Status == "start" {
fmt.Println("EVENT:" + msg.Status)
}
case <-timeout:
break
}
}
Thanks for the detailed issue and for providing a reproducer!
Yeah, if nothing is listening for the events we should probably close the connection. I'll take a closer look at this on Thursday.
Is there a plan to fix this issue as there is an official alternative now?
Is there a plan to fix this issue as there is an official alternative now?
Yeah I plan to get to it before the end of the Summer, just been away for a while.
@yyuuttaaoo @shabicheng can you try #928?
After testing, this patch does not work. So I think the issue should be reopened. First, let me fix a small bug in the test program. $(pidof docker) should be $(pidof dockerd). Now we can use the test program to test. We can see, the fd number is still climbing. Why? There are 2 reasons:
- If there is no new event for some reason, the json.Decode will block forever.
- res.Body.Close actually hangs forever if it is called before conn.Close.
There are 2 other problems in this solution:
- There is a chance that eventChannel and errorChannel are closed before the last event was sent to the channel. This may cause panic.
- If the last event was an error, it will cause reconnect regardless of the number of listeners.
So I have made another patch based on the current one. https://github.com/fsouza/go-dockerclient/pull/930
Note:
- The patch still contains some debug printf code, because it is not fully tested.
- I have not run any UT for regression yet.
- I've only test it on Linux using unencrypted docker.sock. SSH/TLS are not tested. Windows platform is not tested.
@fsouza Would you help/coauthor to make the new patch complete?
@yyuuttaaoo thank you for detailed explanation! I can definitely work on that patch, but this week is a bit busy. I can come around some time next week and look into that patch and make sure it's ready for prime time!