gproc gproc_pool:claim doesn't return worker when calling process is killed

I've found gproc_pool very flexible vs other pool libraries, but found this behaviour

48>  gproc_pool:claim(pool, fun(_, P) ->  P end).
{true,<0.185.0>}
49> Pid = spawn_link(fun() -> gproc_pool:claim(pool, fun(_, P) -> timer:sleep(10000), P en
d) end).
<0.223.0>
50> exit(Pid, kill).
** exception exit: killed
51>  gproc_pool:claim(pool, fun(_, P) ->  P end).
false
69> gproc_pool:active_workers(pool).
[{a,<0.185.0>}]

After killing the process, worker stay busy forever. I look through the code and found that it is expected behaviour. Am I right? Are there any plans to improve it?

Apr 01 '16 19:04 netDalek

It's true that the code doesn't handle the case of a process inside a claim operation gets killed from the outside. Strictly speaking, this is a bug.

Fixing it will cause a general slowdown, but I think I found a way that is still cheap enough. See PR #110. Let me know what you think.

Apr 02 '16 09:04 uwiger

Looks good and simple, can't find disadvantages

Apr 02 '16 20:04 netDalek

Describing it to my son, I came up with some failure scenarios that seem very hard to address.

Basically, the approach is inherently vulnerable to the calling process being murdered while running inside the claim. Starting from the (correct) assumption that the process could be killed after any given call, at least the following scenarios, however unlikely, come to mind:

Line 509, just after getting the lock, but before spawning the monitoring process. A solution could be to start monitoring before trying the lock, but this seems to forfeit the purpose of the approach.
Line 551, if the calling process is killed just after killing the monitoring process, but before resetting the counter, the counter will never be reset.
If lines 551 & 552 are switched, and the calling process resets the counter then dies before killing the monitoring process, the monitor will reset the counter again, possibly after another process has claimed it.

Considering the above, one could either conclude that the claim() operation is broken and shouldn't be used, or decide to live with the probability that it sometimes breaks, and then decide which types of breakage are least desirable (I'd wager that the counter never getting reset is the most serious).

Alternatively, figure out a different approach that's safe and not so much slower that the function becomes unfit for its purpose.

Apr 03 '16 07:04 uwiger

(This would be a good time to play around with QuickCheck/Pulse ...)

Apr 03 '16 07:04 uwiger

Since it certainly is an improvement over the previous implementation, I decided to merge. We can try to eliminate the (very slight) holes in the algorithm later.

Apr 03 '16 19:04 uwiger

Line 551, if the calling process is killed just after killing the monitoring process, but before resetting the counter, the counter will never be reset.

May be it's better in place of kill + reset_counter send a message to spawned process.

execute_claim(F, K, Pid) ->
     Parent = self(),
     Mon = spawn(
             fun() ->
                     Ref = erlang:monitor(process, Parent),
                     receive
                         {'DOWN', Ref, _, _, _} ->
                             gproc:reset_counter(K)
                         exit ->
                             gproc:reset_counter(K)
                     end
             end),
     try begin
             Res = F(K, Pid),
             {true, Res}
         end
     after
         Pid ! exit
     end.

Apr 04 '16 09:04 netDalek

I thought about that, but it complicates things, is not inherently safe, and also introduces significant latency (assuming the monitoring process always does the reset).

It would probably at least be a good idea to switch lines 551 and 552, though. I forgot to do that.

Apr 04 '16 09:04 uwiger

gproc gproc copied to clipboard

gproc_pool:claim doesn't return worker when calling process is killed

gproc
gproc copied to clipboard