rustler icon indicating copy to clipboard operation
rustler copied to clipboard

Segmentation fault based on Mix.exs compiler order

Open cgregfreeman opened this issue 7 years ago • 4 comments

I think I have traced down the segmentation fault issue that I mentioned in the other issue.

My mix.exs looks like this with the compiler line:

     compilers: [ :rustler, :phoenix, :gettext  ] ++ Mix.compilers,
     rustler_crates: rustler_crates(),

If I change the :rustler atom to the end, like so:

     compilers: [  :phoenix, :gettext , :rustler ] ++ Mix.compilers,
     rustler_crates: rustler_crates(),

I get strange behavior out of iex -S mix.

I've been able to replicate this with very simple code within a basic phoenix project.

Here is the strange behavior:

The initial run with iex -S mix and the calls to the code work perfectly. I get the Nif function calls within the elixir interactive environment to work just fine.

Then I close iex and reopen it.

The next time I run it, calling the elixir function that uses the rust NIF gives a segmentation fault that crashes iex.

If I edit the elixir code such that it is recompiled, I have to then run iex -S mix twice. The first gives an elixir compilation error. The next time, the elixir code compiles. Then the NIF call will work. But, close the iex and then reopen it and the segmentation fault appears.

I have been able to replicate this by switching the compiler order of :rustler to the end and to the end of the list order. I'm not sure if it's phoenix that's causing this problem or if it's just elixir.

Here's a shot of what I have under development as an example:

~/Coding/phoenix_rust_ports_and_nifs $ iex -S mix
Erlang/OTP 19 [erts-8.3] [source] [64-bit] [smp:4:4] [async-threads:10] [kernel-poll:false]

Compiling NIF crate :nifexample (native/nifexample)...
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
Compiling 13 files (.ex)
Generated phoenix_rust_ports_and_nifs app
Interactive Elixir (1.4.2) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> NifExample.testMap()
(4, <<"fourthEntry">>)
                      (third, 3.000000e+00)
                                           (<<"fifthEntry">>, <<"five">>)
                                                                         (<<"firstEntry">>, 1)
                                                                                              (<<"secondEntry">>, second)
     :ok
iex(2)> 
BREAK: (a)bort (c)ontinue (p)roc info (i)nfo (l)oaded
       (v)ersion (k)ill (D)b-tables (d)istribution
^C~/Coding/phoenix_rust_ports_and_nifs $ iex -S mix
Erlang/OTP 19 [erts-8.3] [source] [64-bit] [smp:4:4] [async-threads:10] [kernel-poll:false]

Compiling NIF crate :nifexample (native/nifexample)...
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
Interactive Elixir (1.4.2) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> NifExample.testMap()
Segmentation fault
~/Coding/phoenix_rust_ports_and_nifs $

Then I change the order back:

~/Coding/phoenix_rust_ports_and_nifs $ iex -S mix
Erlang/OTP 19 [erts-8.3] [source] [64-bit] [smp:4:4] [async-threads:10] [kernel-poll:false]

Compiling NIF crate :nifexample (native/nifexample)...
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
Compiling 13 files (.ex)

== Compilation error on file lib/nif_example.ex ==

21:23:53.275 [warn]  The on_load function for module Elixir.NifExample returned {:error, {:upgrade, 'Upgrade not supported by this NIF library.'}}

** (MatchError) no match of right hand side value: {:error, :on_load_failure}
    (stdlib) erl_eval.erl:670: :erl_eval.do_apply/6
    (elixir) lib/kernel/parallel_compiler.ex:117: anonymous fn/4 in Kernel.ParallelCompiler.spawn_compilers/1

~/Coding/phoenix_rust_ports_and_nifs $ iex -S mix
Erlang/OTP 19 [erts-8.3] [source] [64-bit] [smp:4:4] [async-threads:10] [kernel-poll:false]

Compiling NIF crate :nifexample (native/nifexample)...
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
Compiling 13 files (.ex)
Generated phoenix_rust_ports_and_nifs app
Interactive Elixir (1.4.2) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> NifExample.testMap()
(4, <<"fourthEntry">>)
                      (third, 3.000000e+00)
                                           (<<"fifthEntry">>, <<"five">>)
                                                                         (<<"firstEntry">>, 1)
                                                                                              (<<"secondEntry">>, second)
     :ok
iex(2)> 
BREAK: (a)bort (c)ontinue (p)roc info (i)nfo (l)oaded
       (v)ersion (k)ill (D)b-tables (d)istribution
^C~/Coding/phoenix_rust_ports_and_nifs $ iex -S mix
Erlang/OTP 19 [erts-8.3] [source] [64-bit] [smp:4:4] [async-threads:10] [kernel-poll:false]

Compiling NIF crate :nifexample (native/nifexample)...
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
Interactive Elixir (1.4.2) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> NifExample.testMap()
(4, <<"fourthEntry">>)
                      (third, 3.000000e+00)
                                           (<<"fifthEntry">>, <<"five">>)
                                                                         (<<"firstEntry">>, 1)
                                                                                              (<<"secondEntry">>, second)
     :ok
iex(2)> 


For now, I'll be leaving rustler at the beginning of the list to avoid this problem.

cgregfreeman avatar May 17 '17 04:05 cgregfreeman

Would you mind sending me the test project?

hansihe avatar May 17 '17 11:05 hansihe

Sure, the code I just figured that out on is this code: https://github.com/cgregfreeman/phoenix_rust_ports_and_nifs

The README isn't quite in sync yet

cgregfreeman avatar May 17 '17 13:05 cgregfreeman

It seems like the problem with having the :phoenix and :gettext compilers before rustler is that they seem to load modules in the user code. This causes the NIF to be loaded before the :rustler compiler is actually run.

As for why there is a segfault, I am really not sure. I am unable to reproduce it. I have tried reproducing on both the debug build and with valgrind as well. There are no failed assertions or detected memory corruption by valgrind. Unsure what else I can do.

It still worries me though. The fact that there is a segfault shows that something bad is happening somewhere. I would very much like to figure out what is going on here.

hansihe avatar May 21 '17 00:05 hansihe

I don't get a segfault, but when in iex -S mix running recompile does nothing. I have to ctrl-c out of iex and start a new session. Not sure how to fix that kind of issue.

That goes the same for doing any mix, like mix phoenix.server

cdesch avatar Feb 11 '18 06:02 cdesch

I think we can close this because the "compiler" doesn't exist anymore.

filmor avatar Oct 12 '23 14:10 filmor