otp
otp copied to clipboard
`erpc:call` behavior is different on local and remote nodes
Describe the bug
erpc:call
usually runs the given function in a spawned process. However, if the given Node
is the local node and if the given Timeout
is infinity
(implicit in erpc:call/2
and /4
), an optimization is used that instead uses erlang:apply
:
https://github.com/erlang/otp/blob/d05de4c400200fabf3b22edd8cd7f75a02d44602/lib/kernel/src/erpc.erl#L255-L260
https://github.com/erlang/otp/blob/d05de4c400200fabf3b22edd8cd7f75a02d44602/lib/kernel/src/erpc.erl#L1267-L1270
Using apply
means that the given function is executed in the context of the process calling erpc:call
, which can have a row of unintended consequences:
- The function could accidentially corrupt the state or interfere with the workings of the calling process, eg by modifying the process dictionary or private/protected ets tables, closing files or sockets, stealing messages, changing the
trap_exit
flag, linking/monitoring (or unlinking/demonitoring) other processes, etc. All of this can not happen if the function executes in a separate process. - The function could cause memory leaks by opening but not closing files or sockets, creating ets tables, etc. If the function executes in a separate process, it can create quite a mess which would all be neatly cleaned up when it finishes. When executed in the calling process, the mess remains.
- When the calling process exits (eg via a link, explicit exit/kill, etc) while executing the function, the function is basically interrupted at whatever it is doing. When executed in a separate process, it runs through to the end, no matter what happens to the calling process.
To Reproduce
Exemplified by creating an private
ets
table. The distinction between executing the function in the calling process (1> and 2>) vs a separate one (3> and 4>) is forced by the timeout.
1> erpc:call(node(), fun() -> ets:new(foo, [named_table, private]) end, infinity).
foo
2> ets:tab2list(foo).
[]
3> erpc:call(node(), fun() -> ets:new(bar, [named_table, private]) end, 1000).
bar
4> ets:tab2list(bar).
** exception error: bad argument
in function ets:match_object/2
called as ets:match_object(bar,'_')
*** argument 1: the table identifier does not refer to an existing ETS table
in call from ets:tab2list/1 (ets.erl, line 2266)
As can be seen at 2>, the ets
table created in the call at 1> still exists, and is also accessible by the calling process. Conversely at 4>, the ets
table created in the call at 3> is gone.
Exemplifying the interruption of the function when the calling process crashes. A function sets a timer to kill itself after 1s, then uses erpc:call
to execute a function which waits 2s before sending done
back to the shell.
1> Self = self().
<0.81.0>
2> spawn(fun() -> timer:kill_after(1000, self()), erpc:call(node(), fun() -> timer:sleep(2000), Self ! done end, infinity) end).
<0.108.0>
3> receive done -> ok after 5000 -> error end.
error
4> spawn(fun() -> timer:kill_after(1000, self()), erpc:call(node(), fun() -> timer:sleep(2000), Self ! done end, 5000) end).
<0.104.0>
5> receive done -> done after 5000 -> error end.
done
As can be seen at 3>, the done
message never arrives when the calling process exits in the wait period of the function that was started at 2>. Conversely, at 5> we get the done
message even though the calling process exited.
Expected behavior
erpc
call should behave the same, no matter the node or timeout. Specifically, it should always execute the given function in a separate process.
Affected versions
OTP 27, but probably going back all the way to OTP 23 when erpc
was introduced.
Additional context
The optimization seems pointless to me. When the function is executed at the local node, it already is fast, no need to try to shave off a few microseconds. Also, it only takes place when the given Timeout
is infinity
, which indicates that the caller is prepared to wait for however long it takes, not in a hurry.