rebar3
rebar3 copied to clipboard
Crash and log corruption without extended start script
Issue
Using a minimal project with a release but without an extended start script results in the program crashing at startup with garbled output.
Using the project linked at the end of this message, run rebar3 release
, then ./_build/default/rel/foo/bin/foo
:
{erl_prim_loader,file_error}
[70,105,108,101,32,111,112,101,114,97,116,105,111,110,32,101,114,114,111,114,58,32,98,97,100,97,114,103,46,32,84,97,114,103,101,116,58,32,<<"/tmp/foo/_build/default/rel/foo/releases/0.0.0/start">>,<<>>,46,98,111,111,116,46,32,70,117,110,99,116,105,111,110,58,32,103,101,116,95,102,105,108,101,46,32]
{erl_prim_loader,file_error}
[70,105,108,101,32,111,112,101,114,97,116,105,111,110,32,101,114,114,111,114,58,32,98,97,100,97,114,103,46,32,84,97,114,103,101,116,58,32,47,116,109,112,47,102,111,111,47,95,98,117,105,108,100,47,100,101,102,97,117,108,116,47,114,101,108,47,102,111,111,47,98,105,110,47,<<"/tmp/foo/_build/default/rel/foo/releases/0.0.0/start">>,<<>>,46,98,111,111,116,46,32,70,117,110,99,116,105,111,110,58,32,103,101,116,95,102,105,108,101,46,32]
(no lo{g"gienri tp rteesremnitn)a tuinnegx pienc tdeod_ blooogtg"e,r{ bmaedsasragg,e[:{ e{rlloagn,ge,rlriosrt,_"tEor_raotro mi,n[ [p<r<o"c/etsmsp /~fpo ow/i_tbhu ielxdi/td evfaaluulet:/~rne~lp/~fno"o,/[r<e0l.e9a.s0e"s,/{0b.a0d.a0r/gs,t[a{retr"l>a>n,g<,<l>i>s,t4_6t,o9_8a,t1o1m1,,[1[11<,<1"1/6t]m]p,/[f]o}o,/{_ibnuiitl,dg/edte_fbaouoltt,/2r,e[l]/}f,o{oi/nrietl,edaos_ebso/o0t.,03.,0[/]s}t]a}r}t
>>,<<>>,46,98,111,111,116]],[]},{init,get_boot,2,[]},{init,do_boot,3,[]}]}],#{error_logger=>#{emulator=>true,tag=>error},gl=><0.0.0>,pid=><0.9.0>,time=>1620629184595737}}
init terminating in do_boot ({badarg,[{erlang,list_to_atom,[[_]],[]},{init,get_boot,2,[]},{init,do_boot,3,[]}]})
Crash dump is being written to: erl_crash.dump...done
A similar issue was reported on #2229.
An interesting thing is that the output references _build/default/rel/foo/releases/0.0.0/start
which does not exist; there is start.boot
but no start
file. The foo
script being produced in the release, which is the main executable, contains:
[ -f "$REL_DIR/$REL_NAME.boot" ] && BOOTFILE="$REL_NAME" || BOOTFILE=start
Which is strange given that BOOTFILE
does not seem to be initialized before. In any case, the default value start
is clearly incorrect.
Using start.boot
itself does not resolve the problem though, so there has to be something else. Finding out why the error is borked is even more important.
Beyond the problem at hand, I noticed that the script does not set the -u
sh option (it uses set -e
only). I would strongly recommend to use set -eu
which can detect potential issues resulting from uninitialized variables.
Environment
- Erlang 23.3.2.
- rebar3 3.15.1.
Data
The .boot
is assumed by Erlang when you pass a -boot
arg to erl
.
Output like this usually means there is something messed up in vm.args
.
yeah the foo.zip sample comes with no vm.args at all, so that's a bit funnier, since the only stuff passed in are commented files we generate as defaults that set a name and a cookie.
I went in and ran the commands manually and replicated the failure a couple of times, started from scratch, reproduced again. Then I added a little echo "$BINDIR/erlexec" "$@"
on top of the start script to get the command line as it ran exactly. I got a quick error for BINDIR
not existing when running the command manually, so I set it to the ERTS bin path and it worked.
So the interesting I get is that calling the command directly works fine, but calling exec "$BINDIR/erlexec "$@"
is what fails.
This let me figure out that the issue was:
exec "$BINDIR/erlexec" "$@"
When I replace this with
exec "$BINDIR/erlexec" $@
Then it actually boots fine. This is sort of weird because usually the "$@"
is something shellchecks would warn you about and be angry if you don't get right I guess.
Fixing in https://github.com/erlware/relx/pull/869
I noticed this a while back with VerneMQ: https://github.com/vernemq/vernemq/pull/1740#issuecomment-785813380
where from a specific rebar3
release, it would create start.boot
instead of vernemq.boot
in the releases subdirs.
I just reverted to an older rebar3
release back then but if it's useful I can try to find out where exactly the behaviour changes.
Thanks for the relx fix! (will test this too)
@ferd after encountering this as well and some more digging, I believe the issue is actually the presence of empty arguments to erlexec
. Modifying the end of start script as
# Try removing the quotes around "$@"
EXEC=("$BINDIR/erlexec" "$@")
echo ----
for arg in "${EXEC[@]}"; do
echo \""$arg"\"
done
echo ----
# Boot the release
exec "${EXEC[@]}"
Shows that with "$@"
the output is
"/erlang/24.0.3/erts-12.0.3/bin/erlexec"
""
"-config"
"/app/_build/prod/rel/example/releases/0.1.0/sys.config"
"-args_file"
"/app/_build/prod/rel/example/releases/0.1.0/vm.args"
"-boot_var"
"ERTS_LIB_DIR"
"/erlang/24.0.3/erts-12.0.3/../lib"
"-boot"
"/app/_build/prod/rel/example/releases/0.1.0/start"
""
and with $@
it is
"/erlang/24.0.3/erts-12.0.3/bin/erlexec"
"-config"
"/app/_build/prod/rel/example/releases/0.1.0/sys.config"
"-args_file"
"/app/_build/prod/rel/example/releases/0.1.0/vm.args"
"-boot_var"
"ERTS_LIB_DIR"
"/erlang/24.0.3/erts-12.0.3/../lib"
"-boot"
"/app/_build/prod/rel/example/releases/0.1.0/start"
With this in mind, I suspect that $@
may introduce other problems with unintentional word splitting, though perhaps it's not expected or supported to encounter an argument with a space in it.
In any case, if I may, I'd be happy to submit a PR to https://github.com/erlware/relx to remove empty arguments before calling erlexec
.
yeah if you can submit that I'd be happy to review and merge.
Thanks @ferd! Here it is: https://github.com/erlware/relx/pull/879