rebar3 icon indicating copy to clipboard operation
rebar3 copied to clipboard

Crash and log corruption without extended start script

Open galdor opened this issue 3 years ago • 7 comments

Issue

Using a minimal project with a release but without an extended start script results in the program crashing at startup with garbled output.

Using the project linked at the end of this message, run rebar3 release, then ./_build/default/rel/foo/bin/foo:

{erl_prim_loader,file_error}
[70,105,108,101,32,111,112,101,114,97,116,105,111,110,32,101,114,114,111,114,58,32,98,97,100,97,114,103,46,32,84,97,114,103,101,116,58,32,<<"/tmp/foo/_build/default/rel/foo/releases/0.0.0/start">>,<<>>,46,98,111,111,116,46,32,70,117,110,99,116,105,111,110,58,32,103,101,116,95,102,105,108,101,46,32]
{erl_prim_loader,file_error}
[70,105,108,101,32,111,112,101,114,97,116,105,111,110,32,101,114,114,111,114,58,32,98,97,100,97,114,103,46,32,84,97,114,103,101,116,58,32,47,116,109,112,47,102,111,111,47,95,98,117,105,108,100,47,100,101,102,97,117,108,116,47,114,101,108,47,102,111,111,47,98,105,110,47,<<"/tmp/foo/_build/default/rel/foo/releases/0.0.0/start">>,<<>>,46,98,111,111,116,46,32,70,117,110,99,116,105,111,110,58,32,103,101,116,95,102,105,108,101,46,32]
(no lo{g"gienri tp rteesremnitn)a tuinnegx pienc tdeod_ blooogtg"e,r{ bmaedsasragg,e[:{ e{rlloagn,ge,rlriosrt,_"tEor_raotro mi,n[ [p<r<o"c/etsmsp /~fpo ow/i_tbhu ielxdi/td evfaaluulet:/~rne~lp/~fno"o,/[r<e0l.e9a.s0e"s,/{0b.a0d.a0r/gs,t[a{retr"l>a>n,g<,<l>i>s,t4_6t,o9_8a,t1o1m1,,[1[11<,<1"1/6t]m]p,/[f]o}o,/{_ibnuiitl,dg/edte_fbaouoltt,/2r,e[l]/}f,o{oi/nrietl,edaos_ebso/o0t.,03.,0[/]s}t]a}r}t
>>,<<>>,46,98,111,111,116]],[]},{init,get_boot,2,[]},{init,do_boot,3,[]}]}],#{error_logger=>#{emulator=>true,tag=>error},gl=><0.0.0>,pid=><0.9.0>,time=>1620629184595737}}
init terminating in do_boot ({badarg,[{erlang,list_to_atom,[[_]],[]},{init,get_boot,2,[]},{init,do_boot,3,[]}]})

Crash dump is being written to: erl_crash.dump...done

A similar issue was reported on #2229.

An interesting thing is that the output references _build/default/rel/foo/releases/0.0.0/start which does not exist; there is start.boot but no start file. The foo script being produced in the release, which is the main executable, contains:

[ -f "$REL_DIR/$REL_NAME.boot" ] && BOOTFILE="$REL_NAME" || BOOTFILE=start

Which is strange given that BOOTFILE does not seem to be initialized before. In any case, the default value start is clearly incorrect.

Using start.boot itself does not resolve the problem though, so there has to be something else. Finding out why the error is borked is even more important.

Beyond the problem at hand, I noticed that the script does not set the -u sh option (it uses set -e only). I would strongly recommend to use set -eu which can detect potential issues resulting from uninitialized variables.

Environment

  • Erlang 23.3.2.
  • rebar3 3.15.1.

Data

foo.zip

galdor avatar May 10 '21 06:05 galdor

The .boot is assumed by Erlang when you pass a -boot arg to erl.

Output like this usually means there is something messed up in vm.args.

tsloughter avatar May 10 '21 11:05 tsloughter

yeah the foo.zip sample comes with no vm.args at all, so that's a bit funnier, since the only stuff passed in are commented files we generate as defaults that set a name and a cookie.

I went in and ran the commands manually and replicated the failure a couple of times, started from scratch, reproduced again. Then I added a little echo "$BINDIR/erlexec" "$@" on top of the start script to get the command line as it ran exactly. I got a quick error for BINDIR not existing when running the command manually, so I set it to the ERTS bin path and it worked.

So the interesting I get is that calling the command directly works fine, but calling exec "$BINDIR/erlexec "$@" is what fails.

This let me figure out that the issue was:

exec "$BINDIR/erlexec" "$@"

When I replace this with

exec "$BINDIR/erlexec" $@

Then it actually boots fine. This is sort of weird because usually the "$@" is something shellchecks would warn you about and be angry if you don't get right I guess.

ferd avatar May 10 '21 12:05 ferd

Fixing in https://github.com/erlware/relx/pull/869

ferd avatar May 11 '21 17:05 ferd

I noticed this a while back with VerneMQ: https://github.com/vernemq/vernemq/pull/1740#issuecomment-785813380 where from a specific rebar3 release, it would create start.boot instead of vernemq.boot in the releases subdirs.

I just reverted to an older rebar3 release back then but if it's useful I can try to find out where exactly the behaviour changes. Thanks for the relx fix! (will test this too)

ioolkos avatar May 11 '21 19:05 ioolkos

@ferd after encountering this as well and some more digging, I believe the issue is actually the presence of empty arguments to erlexec. Modifying the end of start script as

# Try removing the quotes around "$@"
EXEC=("$BINDIR/erlexec" "$@")

echo ----
for arg in "${EXEC[@]}"; do
    echo \""$arg"\"
done
echo ----

# Boot the release
exec "${EXEC[@]}"

Shows that with "$@" the output is

"/erlang/24.0.3/erts-12.0.3/bin/erlexec"
""
"-config"
"/app/_build/prod/rel/example/releases/0.1.0/sys.config"
"-args_file"
"/app/_build/prod/rel/example/releases/0.1.0/vm.args"
"-boot_var"
"ERTS_LIB_DIR"
"/erlang/24.0.3/erts-12.0.3/../lib"
"-boot"
"/app/_build/prod/rel/example/releases/0.1.0/start"
""

and with $@ it is

"/erlang/24.0.3/erts-12.0.3/bin/erlexec"
"-config"
"/app/_build/prod/rel/example/releases/0.1.0/sys.config"
"-args_file"
"/app/_build/prod/rel/example/releases/0.1.0/vm.args"
"-boot_var"
"ERTS_LIB_DIR"
"/erlang/24.0.3/erts-12.0.3/../lib"
"-boot"
"/app/_build/prod/rel/example/releases/0.1.0/start"

With this in mind, I suspect that $@ may introduce other problems with unintentional word splitting, though perhaps it's not expected or supported to encounter an argument with a space in it.

In any case, if I may, I'd be happy to submit a PR to https://github.com/erlware/relx to remove empty arguments before calling erlexec.

mxxk avatar Jul 24 '21 05:07 mxxk

yeah if you can submit that I'd be happy to review and merge.

ferd avatar Jul 24 '21 13:07 ferd

Thanks @ferd! Here it is: https://github.com/erlware/relx/pull/879

mxxk avatar Jul 24 '21 21:07 mxxk