temporal icon indicating copy to clipboard operation
temporal copied to clipboard

temporal Docker fails to bind on multiple interfaces

Open hazcod opened this issue 5 years ago • 6 comments

Expected Behavior

temporal/auto-setup:latest should bind on 0.0.0.0 in docker scenarios, instead of binding to specific IPs.

Actual Behavior

Binding multiple networks to the temporal docker container results in:

 {"level":"fatal","ts":"2020-06-19T08:24:43.081Z","msg":"ListenIP failed, unable to parse bindOnIP value %q or it is not IPv4 address","address":"172.23.0.4 172.19.0.3","logging-call-at":"rpc.go:186","stacktrace":"github.com/temporalio/temporal/common/log/loggerimpl.(*loggerImpl).Fatal\n\t/temporal/common/log/loggerimpl/logger.go:144\ngithub.com/temporalio/temporal/common/rpc.getListenIP\n\t/temporal/common/rpc/rpc.go:186\ngithub.com/temporalio/temporal/common/rpc.(*RPCFactory).GetGRPCListener\n\t/temporal/common/rpc/rpc.go:126\ngithub.com/temporalio/temporal/common/resource.New\n\t/temporal/common/resource/resourceImpl.go:154\ngithub.com/temporalio/temporal/service/history.NewService\n\t/temporal/service/history/service.go:471\ngithub.com/temporalio/temporal/cmd/server/temporal.(*server).startService\n\t/temporal/cmd/server/temporal/server.go:262\ngithub.com/temporalio/temporal/cmd/server/temporal.(*server).Start\n\t/temporal/cmd/server/temporal/server.go:85\ngithub.com/temporalio/temporal/cmd/server/temporal.startHandler\n\t/temporal/cmd/server/temporal/temporal.go:91\ngithub.com/temporalio/temporal/cmd/server/temporal.BuildCLI.func1\n\t/temporal/cmd/server/temporal/temporal.go:211\ngithub.com/urfave/cli.HandleAction\n\t/go/pkg/mod/github.com/urfave/[email protected]/app.go:528\ngithub.com/urfave/cli.Command.Run\n\t/go/pkg/mod/github.com/urfave/[email protected]/command.go:174\ngithub.com/urfave/cli.(*App).Run\n\t/go/pkg/mod/github.com/urfave/[email protected]/app.go:279\nmain.main\n\t/temporal/cmd/server/main.go:38\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203"}

Steps to Reproduce the Problem

  temporal:
    image: temporalio/auto-setup:latest
    restart: "on-failure:5"
    networks:
    - backend
    - backend2
    environment:
      - "DB=postgres"
      - "DB_PORT=26257"
      - "POSTGRES_USER=root"
      - "POSTGRES_PWD=postgres"
      - "POSTGRES_SEEDS=postgres"
    ports:
    - 7233

hazcod avatar Jun 19 '20 08:06 hazcod

I'm hitting the same error on Azure App Service. My docker-compose.yml is very similar to the default one, just without a cassandra container. No explicit networks configuration.

I get

{"level":"fatal","ts":"2020-09-07T21:00:28.845Z","msg":"ListenIP failed, unable to parse bindOnIP value or it 
is not IPv4 address","address":"172.16.3.2 172.16.0.3","logging-call-at":"rpc.go:186","stacktrace":
"go.temporal.io/server/common/log/loggerimpl.(*loggerImpl).Fatal\n\t/temporal/common/log/loggerimpl/logger.go:144
\ngo.temporal.io/server/common/rpc.getListenIP\n\t/temporal/common/rpc/rpc.go:186
\ngo.temporal.io/server/common/rpc.(*RPCFactory).GetGRPCListener\n\t/temporal/common/rpc/rpc.go:126
\ngo.temporal.io/server/common/resource.New\n\t/temporal/common/resource/resourceImpl.go:154
\ngo.temporal.io/server/service/history.NewService\n\t/temporal/service/history/service.go:479
\ngo.temporal.io/server/cmd/server/temporal.(*server).startService\n\t/temporal/cmd/server/temporal/server.go:265
\ngo.temporal.io/server/cmd/server/temporal.(*server).Start\n\t/temporal/cmd/server/temporal/server.go:85
\ngo.temporal.io/server/cmd/server/temporal.startHandler\n\t/temporal/cmd/server/temporal/temporal.go:91
\ngo.temporal.io/server/cmd/server/temporal.BuildCLI.func1\n\t/temporal/cmd/server/temporal/temporal.go:211
\ngithub.com/urfave/cli.HandleAction\n\t/go/pkg/mod/github.com/urfave/[email protected]/app.go:528
\ngithub.com/urfave/cli.Command.Run\n\t/go/pkg/mod/github.com/urfave/[email protected]/command.go:174
\ngithub.com/urfave/cli.(*App).Run\n\t/go/pkg/mod/github.com/urfave/[email protected]/app.go:279
\nmain.main\n\t/temporal/cmd/server/main.go:38\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203"}

Any workarounds?

mikhailshilkov avatar Sep 07 '20 21:09 mikhailshilkov

@wxing1292 can you confirm if this is still a problem?

samarabbas avatar Jul 03 '21 23:07 samarabbas

I got the same error as soon as I added the second network:

networks:
    - backend
    - backend2

Miha-ha avatar Jul 28 '21 14:07 Miha-ha

Please clarify if a solution to this problem is planned?

Miha-ha avatar Aug 09 '21 13:08 Miha-ha

Crazyness, I've broken my brain solving it especially when there is actually no documentation for:

  • this config: https://github.com/temporalio/temporal/blob/master/docker/config_template.yaml
  • this config: https://github.com/temporalio/temporal/blob/master/config/dynamicconfig/development_es.yaml
  • explanation how exactly the second of abovementioned configs should be used to overwrite the first one (They have different structure! Actually I cannot see what's common between them at all - they do not seem to share same keys, at least in abovementioned examples provided. For instance You won't find a word "system" in file which the first link is pointing to). The only documentation is "you can" here. Wow... Thanks... It helped (sarcasm).
  • explanation of environment variables used by temporal server. The only explanation on temporal.io website is about temporal-web, not about temporal server itself.

So after an evening of investigeting I've come across these issues:

  • https://community.temporal.io/t/temporalio-temporal-server-overwrite-the-127-0-0-1-7233-ip-address-to-something-else/544
  • https://github.com/temporalio/temporal/blob/master/docker/config_template.yaml#L214
  • https://github.com/temporalio/temporal/blob/master/config/dynamicconfig/README.md From which I've indirectly understood that could bind 0.0.0.0. Of course I could - this is first what I did but immediately encountered another error: I should propagate broadcastAddress. I had no Idea how (see my mention about docs above). And I still have no idea how to make it via config. I've also understood that broadcastAddress is used only for cluster intercommunication. Also I've seen mentions about few interesting env vars in those issues.

So this seems to work:

    temporal:
        depends_on:
            - mysql
            - elasticsearch
        environment:
            DBNAME: temporal
            VISIBILITY_DBNAME: temporal_visibility
            DB: mysql
            MYSQL_USER: temporal
            MYSQL_PWD: <passwd_here>
            MYSQL_SEEDS: mysql
            DYNAMIC_CONFIG_FILE_PATH: config/dynamicconfig/development_es.yaml
            ENABLE_ES: true
            ES_SEEDS: elasticsearch
            ES_VERSION: v7
            BIND_ON_IP: 0.0.0.0
            TEMPORAL_BROADCAST_ADDRESS: 127.0.0.1
        image: temporalio/auto-setup:1.14.0
        volumes:
            - ./dynamicconfig:/etc/temporal/config/dynamicconfig
        networks:
            - default # perform DB queries
            - traefik # receive requests from a load balancer
        labels:
            traefik.enable: 'true'
            traefik.frontend.rule: 'Host: temporal.local'
            traefik.port: '7233'
            traefik.protocol: 'h2c'

The solution is the latter two env vars. You could bind 0.0.0.0 though using 127.0.0.1 for cluster intercom:

            BIND_ON_IP: 0.0.0.0
            TEMPORAL_BROADCAST_ADDRESS: 127.0.0.1

Seems like not many people are encountering this problem if it wasn't answered 1.5 years. Does everyone just expose a new port for every single stuff instead of using custom CA or LetsEncrypt and a load balancer? Really? Guuuuyz! How do you remember all those port numbers at all?

And one more thing. This is not a bug. Because:

  • You are able to bind 0.0.0.0
  • You have to know your node IP adress to set up production cluster environment (when 127.0.0.1 is not an option)

As for me the issue could be resolved as the above solution seems to work. Though would be nice to improve the doc. Yes I know I'm free to contribute instead of complaining :-)

BTW other solution would a way to disable cluster intercom at all if it is optional. I beleive it's optional if 127.0.0.1 is okay.

programmador avatar Dec 21 '21 18:12 programmador

Trying to add a healthcheck to temporal is tough since I can't curl since we're not bound to 127.0.0.1.

    healthcheck:
      test: ["CMD-SHELL", "curl -s http://localhost:7233/health || exit 1"]

aleclerc-cio avatar Jul 31 '24 18:07 aleclerc-cio

command: ["CMD-SHELL", 'temporal operator cluster health --address $(hostname -i):7233'] if temporal is bound to an interface

emalihin avatar Oct 03 '25 08:10 emalihin