liftbridge icon indicating copy to clipboard operation
liftbridge copied to clipboard

Connection Issue to Liftbridge Service from Dockerized Go Client

Open umermasood opened this issue 3 months ago • 0 comments

Problem

I am experiencing an issue where my Dockerized Go application using the go-liftbridge client fails to connect to the Liftbridge service, despite correct DNS resolution and explicitly providing the resolved IP and port. The client attempts to connect to [::1]:9292 instead of the provided IP address.

Environment

  • Liftbridge Version: v1.9.0
  • Go Liftbridge Client Version: v2.3.0
  • Go Version: 1.22
  • Operating System: Pop!_OS 22.04 LTS x86_64
  • Docker Version: v26.0.1
  • Docker Compose Version: v2.26.1

Steps to Reproduce

  1. Dockerize a simple Go application that uses the go-liftbridge client.
  2. Use Docker Compose to manage the Liftbridge service and the Go application.
  3. Try to connect to the Liftbridge service using an IP address resolved from the service's hostname within Docker Compose.

The connection string is passed correctly and even hardcoding the IP address directly in the client results in attempts to connect to [::1].

Expected Behavior

The client should connect to the Liftbridge service using the provided IP address and port.

Actual Behavior

The client ignores the provided IP address and port and attempts to connect to [::1]:9292, resulting in a connection refused error.

Docker Compose

services:
  nats:
    image: nats:latest
    ports:
      - "4222:4222"
      - "6222:6222"
      - "8222:8222"

  liftbridge:
    image: liftbridge/liftbridge:latest
    command:
      - "--raft-bootstrap-seed"
      - "--nats-servers=nats://nats:4222"
      - "--data-dir=/tmp/liftbridge/"
    ports:
      - "9292:9292"
    depends_on:
      - nats
    volumes:
      - ./liftbridge-data:/tmp/liftbridge

  go-liftbridge-spark-connector:
    image: go-liftbridge-spark-connector-img-no-tcp
    depends_on:
      - liftbridge
    command: ["-liftbridgeIP", "liftbridge", "-liftbridgePort", "9292"]

Go Application Code

The Go application attempts to connect to the Liftbridge service using the service name.

package main

import (
    "errors"
    "flag"
    "log"
    "time"

    lift "github.com/liftbridge-io/go-liftbridge/v2"
    "golang.org/x/net/context"
)

type config struct {
    liftbridgeIP   string
    liftbridgePort string
}

func main() {
	var cfg config

	// Define the flags and set their default values
	flag.StringVar(&cfg.liftbridgeIP, "liftbridgeIP", "localhost", "Liftbridge IP address")
	flag.StringVar(&cfg.liftbridgePort, "liftbridgePort", "9292", "Liftbridge Port")

	flag.Parse()

	// Wait for services to be up
	time.Sleep(5 * time.Second)

	// DNS Resolution Check / Resolve IP Address of the Liftbridge server
	addrs, err := net.LookupHost(cfg.liftbridgeIP)
	if err != nil {
		log.Fatalf("Failed to resolve host: %v", err)
	}
	log.Printf("Resolved liftbridge to: %v", addrs)

        // I even tried using the resolved IP address
	address := addrs[0] + ":" + cfg.liftbridgePort
	log.Printf("Connecting to Liftbridge server at %s", address)
	
	// Connect to the Liftbridge server using the resolved address
	client, err := connectLiftbridgeWithRetry([]string{address}, 10, 2*time.Second)
	
	// Connect to the Liftbridge server using the service name
	//client, err := connectLiftbridgeWithRetry([]string{cfg.liftbridgeIP + ":" + cfg.liftbridgePort}, 5, 2*time.Second)
	if err != nil {
		log.Fatalf("Failed to connect to Liftbridge server: %v", err)
	}
	defer client.Close()
	
	// ... do the rest of the work with client here

	<-ctx.Done()
}

func connectLiftbridgeWithRetry(addresses []string, maxRetries int, delay time.Duration) (lift.Client, error) {
    var client lift.Client
    var err error
    for i := 0; i < maxRetries; i++ {
        log.Printf("Attempt %d to connect to Liftbridge at addresses: %v", i+1, addresses)
        client, err = lift.Connect(addresses)
        if err == nil {
            return client, nil
        }
        log.Printf("Failed to connect to Liftbridge (%d/%d): %v", i+1, maxRetries, err)
        time.Sleep(delay)
    }
    return nil, err
}

Observed Behavior

Go application logs show the following:

go-liftbridge-spark-connector-1  | 2024/05/14 07:48:24 Resolved liftbridge to: [172.27.0.3]
go-liftbridge-spark-connector-1  | 2024/05/14 07:48:24 Connecting to Liftbridge server at liftbridge:9292
go-liftbridge-spark-connector-1  | 2024/05/14 07:48:24 Attempt 1 to connect to Liftbridge at addresses: [liftbridge:9292]
go-liftbridge-spark-connector-1  | 2024/05/14 07:48:24 Failed to connect to Liftbridge (1/5): rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp [::1]:9292: connect: connection refused"

Remaining attempts also fail with the same error.

What I Have Tried

  1. Network Connectivity: Confirmed that the Go container can resolve and connect to the Liftbridge container using nc:

    nc -zv liftbridge 9292
    Connection to liftbridge (172.27.0.3) 9292 port [tcp/*] succeeded!
    
  2. Docker Compose Configuration: Ensured all services are on the same network and using the correct service names.

  3. Simplified Go Code: Reduced the Go application to just attempt to connect to Liftbridge and log success or failure.

Question

This issue seems to occur specifically within the Dockerized environment, and the same setup works when not running inside Docker. Any guidance or suggestions on what might be causing this behavior would be greatly appreciated.


umermasood avatar May 13 '24 10:05 umermasood