temporal Docker fails to bind on multiple interfaces
Expected Behavior
temporal/auto-setup:latest should bind on 0.0.0.0 in docker scenarios, instead of binding to specific IPs.
Actual Behavior
Binding multiple networks to the temporal docker container results in:
{"level":"fatal","ts":"2020-06-19T08:24:43.081Z","msg":"ListenIP failed, unable to parse bindOnIP value %q or it is not IPv4 address","address":"172.23.0.4 172.19.0.3","logging-call-at":"rpc.go:186","stacktrace":"github.com/temporalio/temporal/common/log/loggerimpl.(*loggerImpl).Fatal\n\t/temporal/common/log/loggerimpl/logger.go:144\ngithub.com/temporalio/temporal/common/rpc.getListenIP\n\t/temporal/common/rpc/rpc.go:186\ngithub.com/temporalio/temporal/common/rpc.(*RPCFactory).GetGRPCListener\n\t/temporal/common/rpc/rpc.go:126\ngithub.com/temporalio/temporal/common/resource.New\n\t/temporal/common/resource/resourceImpl.go:154\ngithub.com/temporalio/temporal/service/history.NewService\n\t/temporal/service/history/service.go:471\ngithub.com/temporalio/temporal/cmd/server/temporal.(*server).startService\n\t/temporal/cmd/server/temporal/server.go:262\ngithub.com/temporalio/temporal/cmd/server/temporal.(*server).Start\n\t/temporal/cmd/server/temporal/server.go:85\ngithub.com/temporalio/temporal/cmd/server/temporal.startHandler\n\t/temporal/cmd/server/temporal/temporal.go:91\ngithub.com/temporalio/temporal/cmd/server/temporal.BuildCLI.func1\n\t/temporal/cmd/server/temporal/temporal.go:211\ngithub.com/urfave/cli.HandleAction\n\t/go/pkg/mod/github.com/urfave/[email protected]/app.go:528\ngithub.com/urfave/cli.Command.Run\n\t/go/pkg/mod/github.com/urfave/[email protected]/command.go:174\ngithub.com/urfave/cli.(*App).Run\n\t/go/pkg/mod/github.com/urfave/[email protected]/app.go:279\nmain.main\n\t/temporal/cmd/server/main.go:38\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203"}
Steps to Reproduce the Problem
temporal:
image: temporalio/auto-setup:latest
restart: "on-failure:5"
networks:
- backend
- backend2
environment:
- "DB=postgres"
- "DB_PORT=26257"
- "POSTGRES_USER=root"
- "POSTGRES_PWD=postgres"
- "POSTGRES_SEEDS=postgres"
ports:
- 7233
I'm hitting the same error on Azure App Service. My docker-compose.yml is very similar to the default one, just without a cassandra container. No explicit networks configuration.
I get
{"level":"fatal","ts":"2020-09-07T21:00:28.845Z","msg":"ListenIP failed, unable to parse bindOnIP value or it
is not IPv4 address","address":"172.16.3.2 172.16.0.3","logging-call-at":"rpc.go:186","stacktrace":
"go.temporal.io/server/common/log/loggerimpl.(*loggerImpl).Fatal\n\t/temporal/common/log/loggerimpl/logger.go:144
\ngo.temporal.io/server/common/rpc.getListenIP\n\t/temporal/common/rpc/rpc.go:186
\ngo.temporal.io/server/common/rpc.(*RPCFactory).GetGRPCListener\n\t/temporal/common/rpc/rpc.go:126
\ngo.temporal.io/server/common/resource.New\n\t/temporal/common/resource/resourceImpl.go:154
\ngo.temporal.io/server/service/history.NewService\n\t/temporal/service/history/service.go:479
\ngo.temporal.io/server/cmd/server/temporal.(*server).startService\n\t/temporal/cmd/server/temporal/server.go:265
\ngo.temporal.io/server/cmd/server/temporal.(*server).Start\n\t/temporal/cmd/server/temporal/server.go:85
\ngo.temporal.io/server/cmd/server/temporal.startHandler\n\t/temporal/cmd/server/temporal/temporal.go:91
\ngo.temporal.io/server/cmd/server/temporal.BuildCLI.func1\n\t/temporal/cmd/server/temporal/temporal.go:211
\ngithub.com/urfave/cli.HandleAction\n\t/go/pkg/mod/github.com/urfave/[email protected]/app.go:528
\ngithub.com/urfave/cli.Command.Run\n\t/go/pkg/mod/github.com/urfave/[email protected]/command.go:174
\ngithub.com/urfave/cli.(*App).Run\n\t/go/pkg/mod/github.com/urfave/[email protected]/app.go:279
\nmain.main\n\t/temporal/cmd/server/main.go:38\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203"}
Any workarounds?
@wxing1292 can you confirm if this is still a problem?
I got the same error as soon as I added the second network:
networks:
- backend
- backend2
Please clarify if a solution to this problem is planned?
Crazyness, I've broken my brain solving it especially when there is actually no documentation for:
- this config: https://github.com/temporalio/temporal/blob/master/docker/config_template.yaml
- this config: https://github.com/temporalio/temporal/blob/master/config/dynamicconfig/development_es.yaml
- explanation how exactly the second of abovementioned configs should be used to overwrite the first one (They have different structure! Actually I cannot see what's common between them at all - they do not seem to share same keys, at least in abovementioned examples provided. For instance You won't find a word "system" in file which the first link is pointing to). The only documentation is "you can" here. Wow... Thanks... It helped (sarcasm).
- explanation of environment variables used by temporal server. The only explanation on temporal.io website is about temporal-web, not about temporal server itself.
So after an evening of investigeting I've come across these issues:
- https://community.temporal.io/t/temporalio-temporal-server-overwrite-the-127-0-0-1-7233-ip-address-to-something-else/544
- https://github.com/temporalio/temporal/blob/master/docker/config_template.yaml#L214
- https://github.com/temporalio/temporal/blob/master/config/dynamicconfig/README.md From which I've indirectly understood that could bind 0.0.0.0. Of course I could - this is first what I did but immediately encountered another error: I should propagate broadcastAddress. I had no Idea how (see my mention about docs above). And I still have no idea how to make it via config. I've also understood that broadcastAddress is used only for cluster intercommunication. Also I've seen mentions about few interesting env vars in those issues.
So this seems to work:
temporal:
depends_on:
- mysql
- elasticsearch
environment:
DBNAME: temporal
VISIBILITY_DBNAME: temporal_visibility
DB: mysql
MYSQL_USER: temporal
MYSQL_PWD: <passwd_here>
MYSQL_SEEDS: mysql
DYNAMIC_CONFIG_FILE_PATH: config/dynamicconfig/development_es.yaml
ENABLE_ES: true
ES_SEEDS: elasticsearch
ES_VERSION: v7
BIND_ON_IP: 0.0.0.0
TEMPORAL_BROADCAST_ADDRESS: 127.0.0.1
image: temporalio/auto-setup:1.14.0
volumes:
- ./dynamicconfig:/etc/temporal/config/dynamicconfig
networks:
- default # perform DB queries
- traefik # receive requests from a load balancer
labels:
traefik.enable: 'true'
traefik.frontend.rule: 'Host: temporal.local'
traefik.port: '7233'
traefik.protocol: 'h2c'
The solution is the latter two env vars. You could bind 0.0.0.0 though using 127.0.0.1 for cluster intercom:
BIND_ON_IP: 0.0.0.0
TEMPORAL_BROADCAST_ADDRESS: 127.0.0.1
Seems like not many people are encountering this problem if it wasn't answered 1.5 years. Does everyone just expose a new port for every single stuff instead of using custom CA or LetsEncrypt and a load balancer? Really? Guuuuyz! How do you remember all those port numbers at all?
And one more thing. This is not a bug. Because:
- You are able to bind 0.0.0.0
- You have to know your node IP adress to set up production cluster environment (when 127.0.0.1 is not an option)
As for me the issue could be resolved as the above solution seems to work. Though would be nice to improve the doc. Yes I know I'm free to contribute instead of complaining :-)
BTW other solution would a way to disable cluster intercom at all if it is optional. I beleive it's optional if 127.0.0.1 is okay.
Trying to add a healthcheck to temporal is tough since I can't curl since we're not bound to 127.0.0.1.
healthcheck:
test: ["CMD-SHELL", "curl -s http://localhost:7233/health || exit 1"]
command: ["CMD-SHELL", 'temporal operator cluster health --address $(hostname -i):7233'] if temporal is bound to an interface