Some CLI commands break for application release with TLS encrypted distributed erlang node configuration
Steps to reproduce
- add tls encrypted distributed erlang node communication support for the application (I'm using vm.args in this case):
-kernel proto_dist inet_tls
-kernel ssl_dist_opt server_verify verify_peer
-kernel ssl_dist_opt client_verify verify_peer
-kernel ssl_dist_opt client_certfile "/path/certfile.pem"
-kernel ssl_dist_opt server_certfile "/path/certfile.pem"
-kernel ssl_dist_opt client_cacertfile "/path/certfile.pem"
-kernel ssl_dist_opt server_fail_if_no_peer_cert true
- create a release of the app
- run the
./bin/app_name startcommand, observe app launches successfully - try and run the ping command
./bin/app_name pingand observe that it fails withReceived 'pang' from [app_name]! stop,remote_console,rpccommands also fail w/ node not found- however the app is actually launched and running successfully by viewing the process list
Interestingly, ./bin/app_name attach works and initializes a usable iex session. Once that session is established, it's possible to verify that the clustering communication is working between two launched application nodes (confirming tls configuration is working correctly).
If you remove just the TLS configuration lines from vm.args, the app starts successfully and then those commands work again.
Perhaps it has something to do w/ how the cli commands are being run outside of the full application context?
What I've also tried is to set the same TLS configuration variables using the ELIXIR_ERL_OPTIONS env var which is used for erlang runtime options in distillery's elixir call when running the commands (https://github.com/bitwalker/distillery/blob/49d88194ad5f100239fa146bb2c649988f6b399f/priv/libexec/erts.sh#L169). I've set that env variable like this, mirroring the vm.args TLS config:
ELIXIR_ERL_OPTIONS="-proto_dist inet_tls \
-ssl_dist_opt client_certfile /path/certfile.pem \
-ssl_dist_opt server_certfile /path/certfile.pem \
-ssl_dist_opt client_cacertfile /path/certfile.pem \
-ssl_dist_opt client_verify verify_peer \
-ssl_dist_opt server_verify verify_peer \
-ssl_dist_opt server_fail_if_no_peer_cert true"
export ELIXIR_ERL_OPTIONS
You can see when calling ping w/ DEBUG_BOOT=true that those flags are properly passed into the erlang call:
++1551363621 ERL='-noshell -s elixir start_cli -logger handle_sasl_reports false'
++1551363621 erl -proto_dist inet_tls -ssl_dist_opt client_certfile /home/app_name/shared/certs/app_name_client_certfile.pem -ssl_dist_opt server_certfile /home/app_name/shared/certs/app_name_server_certfile.pem -ssl_dist_opt client_cacertfile /home/app_name/shared/certs/app_name_cacert.pem -ssl_dist_opt client_verify verify_peer -ssl_dist_opt server_verify verify_peer -noshell -s elixir start_cli -logger handle_sasl_reports false -extra -e Mix.Releases.Runtime.Control.main --logger-sasl-reports false -- ping --name=app_name@host '--cookie=[cookie]'
+++1551363621 whereis_erts_bin
+++1551363621 '[' -z 10.2.1 ']'
+++1551363621 '[' -z '' ']'
+++1551363621 __erts_dir=/home/app_name/releases/20190222201327/erts-10.2.1
+++1551363621 '[' -d /home/app_name/releases/20190222201327/erts-10.2.1 ']'
+++1551363621 echo /home/app_name/releases/20190222201327/erts-10.2.1/bin
++1551363621 __bin=/home/app_name/releases/20190222201327/erts-10.2.1/bin
++1551363621 '[' -z /home/app_name/releases/20190222201327/erts-10.2.1/bin ']'
++1551363621 __erl=/home/app_name/releases/20190222201327/erts-10.2.1/bin/erl
++1551363621 __boot_provided=0
++1551363621 grep '\-boot '
++1551363621 echo -proto_dist inet_tls -ssl_dist_opt client_certfile /home/app_name/shared/certs/app_name_client_certfile.pem -ssl_dist_opt server_certfile /home/app_name/shared/certs/app_name_server_certfile.pem -ssl_dist_opt client_cacertfile /home/app_name/shared/certs/app_name_cacert.pem -ssl_dist_opt client_verify verify_peer -ssl_dist_opt server_verify verify_peer -noshell -s elixir start_cli -logger handle_sasl_reports false -extra -e Mix.Releases.Runtime.Control.main --logger-sasl-reports false -- ping --name=app_name@host '--cookie=[cookie]'
++1551363621 __erts_included=0
++1551363621 [[ /home/app_name/releases/20190222201327/erts-10.2.1/bin/erl =~ ^/home/app_name/releases/20190222201327 ]]
++1551363621 __erts_included=1
++1551363621 '[' 1 -eq 1 ']'
++1551363621 '[' 0 -eq 1 ']'
++1551363621 '[' 1 -eq 1 ']'
++1551363621 /home/app_name/releases/20190222201327/erts-10.2.1/bin/erl -boot_var ERTS_LIB_DIR /home/app_name/releases/20190222201327/lib -boot /home/app_name/releases/20190222201327/bin/start_clean -config /home/app_name/releases/20190222201327/var/sys.config -proto_dist inet_tls -ssl_dist_opt client_certfile /home/app_name/shared/certs/app_name_client_certfile.pem -ssl_dist_opt server_certfile /home/app_name/shared/certs/app_name_server_certfile.pem -ssl_dist_opt client_cacertfile /home/app_name/shared/certs/app_name_cacert.pem -ssl_dist_opt client_verify verify_peer -ssl_dist_opt server_verify verify_peer -noshell -s elixir start_cli -logger handle_sasl_reports false -extra -e Mix.Releases.Runtime.Control.main --logger-sasl-reports false -- ping --name=app_name@host '--cookie=[cookie]'
▸ Received 'pang' from app_name@host!
▸ Possible reasons for this include:
▸ - The cookie is mismatched between us and the target node
▸ - We cannot establish a remote connection to the node
Description of issue
- What are the expected results?
- When using distillery cli commands for a running application with TLS configuration options set in vm.args,
pingstoprpc(and maybe others) should work against the running app and not fail. (those commands work fine for same app without the TLS config)
- What version of Distillery?
2.0.12
- What OS, Erlang/Elixir versions are you seeing this issue on?
OS: CentOS Erlang: 10.2.3 (OTP 21.2.2) Elixir: 1.8.0
- If possible, also provide your
rel/config.exs, as it is often my first troubleshooting question, and you'll save us both time :)
use Mix.Config
# Configures the endpoint
config :app_name, app_nameWeb.Endpoint,
url: [host: "localhost"],
secret_key_base: "[secret]",
render_errors: [view: app_nameWeb.ErrorView, accepts: ~w(html json)],
pubsub: [name: app_name.PubSub, adapter: Phoenix.PubSub.PG2]
# Configures Elixir's Logger
config :logger, :console,
format: "$time $metadata[$level] $message\n",
metadata: [:request_id]
# tell logger to load a LoggerFileBackend processes
config :logger,
backends: [{LoggerFileBackend, :file_log}, :console, {LoggerFileBackend, :logstash_log}]
config :logger, utc_log: true
config :logger,
backends: [{LoggerFileBackend, :file_log}, :console, {LoggerFileBackend, :logstash_log}]
config :logger, :file_log,
path: "log/app_name.log",
format: "$time $metadata[$level] $message\n",
metadata: [:request_id]
config :logger, :console,
format: "$time $metadata[$level] $message\n",
metadata: [:request_id]
config :logger, :logstash_log,
level: :info,
path: "log/logstash.log",
format: {IoraLogging.Formatter, :format},
metadata: [
:request_id,
:host,
:method,
:path,
:status,
:filtered_params,
:state,
:duration,
:ip,
:format,
:controller,
:action,
:tags,
:user_agent,
:user,
:access_token
]
# Use Jason for JSON parsing in Phoenix
config :phoenix, :json_library, Jason
config :phoenix, :filter_parameters, ["password", "access_token"]
# Import environment specific config. This must remain at the bottom
# of this file so it overrides the configuration defined above.
import_config "#{Mix.env()}.exs"
- If this is a runtime configuration issue, please also provide your config file (with any sensitive information stripped of course). This is almost always necessary to understand why some configuration may not be working.
Portion of vm.args that breaks those commands:
-kernel proto_dist inet_tls
-kernel ssl_dist_opt server_verify verify_peer
-kernel ssl_dist_opt client_verify verify_peer
-kernel ssl_dist_opt client_certfile "/path/certfile.pem"
-kernel ssl_dist_opt server_certfile "/path/certfile.pem"
-kernel ssl_dist_opt client_cacertfile "/path/certfile.pem"
-kernel ssl_dist_opt server_fail_if_no_peer_cert true
Thanks for any and all thoughts and help w/ this!
Yeah, this is because many of the commands don't use the vm.args file directly, because it doesn't apply to them but only the running node (which is why start/foreground/console work), attach only works because it bypasses networking entirely and connects to via domain socket. We would need to specifically handle the TLS config vars from vm.args and pass them as extra options to erl/erlexec.