Rserve
Rserve copied to clipboard
Closing the socket client-side results in SIGPIPE, R shutdown which removes the tempdir
When I connect to RServe on Linux using 5 client-side connections And I close the 3rd one forcefully (socket.close() in Java) Then R receives SIGPIPE And the R tempdir is removed (R_CleanTempDir is called)
RServe should set its own signal handler for R_SIGPIPE, and fail gracefully.
Still trying to build a nice example, but it is difficult to pinpoint the exact cause of SIGPIPE.
Based on strace of my application, this always happens in the same method:
15480 13:25:35 [00007f527a6d430d] sendto(4, "\x0a\x08\x01\x00\xa2\x04\x01\x00\x15\xc4\x00\x00\x22\x0c\x00\x00
\x74\x72\x79\x2d\x65\x72\x72\x6f\x72\x00\x01\x01\x13\x08\x00\x00"..., 268, 0, NULL, 0) = -1 EPIPE (Broken pip
e)
> /usr/lib64/libc-2.17.so(__send+0x1d) [0xf930d]
> /usr/lib64/R/bin/Rserve(server_send+0xe) [0x494e]
> /usr/lib64/R/bin/Rserve(Rserve_QAP1_send_resp+0x9f) [0x4a4f]
> /usr/lib64/R/bin/Rserve(Rserve_QAP1_connected+0xeb9) [0x7a29]
> /usr/lib64/R/bin/Rserve(serverLoop+0x2bc) [0xa01c]
> /usr/lib64/R/bin/Rserve(main+0x35b) [0x328b]
> /usr/lib64/libc-2.17.so(__libc_start_main+0xf5) [0x21c05]
> /usr/lib64/R/bin/Rserve(_start+0x29) [0x3d29]
15480 13:25:35 [00007f527a6d430d] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=15480, si_uid=1000}
It's not possible to set the SIGPIPE
handler because R itself resets it continuously so apps/packages cannot touch it.
There are two options
- use set.tempdir for unique temp dirs (useful in particular when you use user-switching - this is what we do in RCloud)
- use something like
if (!dir.exists(tempdir())) dir.create(tempdir(),,TRUE)
althgouh that's not 100% safe in multi-user environments (since you could have another process blow it away after you started running)
I'll see if there is a way to insert a handler before R shutdown so that we can set the tempdir to /dev/null
to avoid the deletion.
Thanks for the comments. I do not understand how RServe can go from a SIGPIPE in the Rserve code, to R_CleanTempDir in the main loop of R. Is this libunwind/strace that makes an error, or did I fail to understand something?
3489 15:38:48 [00007fc9d186b37d] rt_sigaction(SIGPIPE, {sa_handler=0x7fc9d1d41cd0, sa_mask=[PIPE], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fc9d186b270}, {sa_handler=0x7fc9d1d41cd0, sa_mask=[PIPE], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fc9d186b270}, 8) = 0
> /usr/lib64/libc-2.17.so(__GI___libc_sigaction+0xfd) [0x3537d]
> /usr/lib64/libc-2.17.so(signal+0x66) [0x35186]
> /usr/lib64/R/lib/libR.so(locale2charset+0x28c5) [0x148ce5]
> /usr/lib64/libc-2.17.so(killpg+0x40) [0x35270]
> /usr/lib64/libc-2.17.so(__send+0x1d) [0xf930d]
> /usr/lib64/R/bin/Rserve(server_send+0xe) [0x494e]
> /usr/lib64/R/bin/Rserve(Rserve_QAP1_send_resp+0x9f) [0x4a4f]
> /usr/lib64/R/bin/Rserve(Rserve_QAP1_connected+0xeb9) [0x7a29]
> /usr/lib64/R/bin/Rserve(serverLoop+0x2bc) [0xa01c]
> /usr/lib64/R/bin/Rserve(main+0x35b) [0x328b]
> /usr/lib64/libc-2.17.so(__libc_start_main+0xf5) [0x21c05]
> /usr/lib64/R/bin/Rserve(_start+0x29) [0x3d29]
3489 15:38:48 [00007fc9d186b37d] rt_sigaction(SIGINT, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fc9d186b270}, {sa_handler=0x7fc9d1d41cb0, sa_mask=[INT], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fc9d186b270}, 8) = 0
> /usr/lib64/libc-2.17.so(__GI___libc_sigaction+0xfd) [0x3537d]
> /usr/lib64/libc-2.17.so(do_system+0x90) [0x41be0]
> /usr/lib64/R/lib/libR.so(R_system+0x6) [0x1c0a96]
> /usr/lib64/R/lib/libR.so(R_CleanTempDir+0x5a) [0x213e9a]
> /usr/lib64/R/lib/libR.so(R_CleanTempDir+0xe5) [0x213f25]
> /usr/lib64/R/lib/libR.so(setup_Rmainloop+0x5ec) [0x149d8c]
> unexpected_backtracing_error [0x6]
After testing: set.tempdir
does not work.
R_CleanTempDir
cleans the directory specified in Sys_TempDir
.
This is set in InitTempDir
and is currently not modified by unixtools.
See src/main/sysutils.c
and src/unix/sys-std.c
The right way to solve this, in my humble opinion, is to set R_ignore_SIGPIPE on any internal code that is using send() and recvfrom(). See also src/main/main.c
in the R source tree.
/* this flag is set if R internal code is using send() and does not
want to trigger an error on SIGPIPE (e.g., the httpd code).
[It is safer and more portable than other methods of handling
broken pipes on send().]
*/
#ifndef Win32
// controlled by the internal http server in the internet module
int R_ignore_SIGPIPE = 0;
See also the example of the internal HTTP server within R src/modules/internet/Rhttpd.c
. RServe should fall in the same class.
I did some tests with the new code. When receiving a SIGPIPE now, the following happens:
3436 10:29:52 [00007fb764f56bad] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=3436, si_uid=1000} ---
3436 10:29:52 [00007fb764bba37d] rt_sigaction(SIGPIPE, {sa_handler=0x7fb7652accd0, sa_mask=[PIPE], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fb764bba270}, <unfinished ...>
3460 10:29:52 [00007fb764f56bad] sendto(4, "\x01\x00\x01\x00\x48\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00", 16, 0, NULL, 0 <unfinished ...>
3436 10:29:52 [00007fb764bba37d] <... rt_sigaction resumed> {sa_handler=0x7fb7652accd0, sa_mask=[PIPE], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fb764bba270}, 8) = 0
> /usr/lib64/libc-2.17.so(__GI___libc_sigaction+0xfd) [0x3537d]
> /usr/lib64/libc-2.17.so(signal+0x66) [0x35186]
> /usr/lib64/R/lib/libR.so(locale2charset+0x28c5) [0x148ce5]
> /usr/lib64/libc-2.17.so(killpg+0x40) [0x35270]
> /usr/lib64/libpthread-2.17.so(send+0x1d) [0xebad]
> /usr/lib64/R/bin/Rserve(server_send+0x18) [0x55a8]
> /usr/lib64/R/bin/Rserve(Rserve_QAP1_send_resp+0xa0) [0x54d0]
> /usr/lib64/R/bin/Rserve(Rserve_QAP1_connected+0xda0) [0xc7e0]
> /usr/lib64/R/bin/Rserve(serverLoop+0x262) [0xe242]
> /usr/lib64/R/bin/Rserve(main+0x359) [0x47c9]
> /usr/lib64/libc-2.17.so(__libc_start_main+0xf5) [0x21c05]
> /usr/lib64/R/bin/Rserve(_start+0x29) [0x5367]
3436 10:29:52 [00007fb764bba279] rt_sigreturn({mask=[]} <unfinished ...>
3436 10:29:52 [00007fb764f56bad] <... rt_sigreturn resumed> ) = -1 EPIPE (Broken pipe)
> /usr/lib64/libpthread-2.17.so(send+0x1d) [0xebad]
> /usr/lib64/R/bin/Rserve(server_send+0x18) [0x55a8]
> /usr/lib64/R/bin/Rserve(Rserve_QAP1_send_resp+0xa0) [0x54d0]
> /usr/lib64/R/bin/Rserve(Rserve_QAP1_connected+0xda0) [0xc7e0]
> /usr/lib64/R/bin/Rserve(serverLoop+0x262) [0xe242]
> /usr/lib64/R/bin/Rserve(main+0x359) [0x47c9]
> /usr/lib64/libc-2.17.so(__libc_start_main+0xf5) [0x21c05]
> /usr/lib64/R/bin/Rserve(_start+0x29) [0x5367]
3436 10:29:52 [00007fb764f56a3d] recvfrom(4, <unfinished ...>
3436 10:29:52 [00007fb764f56a3d] <... recvfrom resumed> "", 16, 0, NULL, NULL) = 0
> /usr/lib64/libpthread-2.17.so(recv+0x1d) [0xea3d]
> /usr/lib64/R/bin/Rserve(server_recv+0x18) [0x55c8]
> /usr/lib64/R/bin/Rserve(Rserve_QAP1_connected+0x170) [0xbbb0]
> /usr/lib64/R/bin/Rserve(serverLoop+0x262) [0xe242]
> /usr/lib64/R/bin/Rserve(main+0x359) [0x47c9]
> /usr/lib64/libc-2.17.so(__libc_start_main+0xf5) [0x21c05]
> /usr/lib64/R/bin/Rserve(_start+0x29) [0x5367]
3436 10:29:52 [????????????????] +++ exited with 0 +++
1238 10:29:52 [00007fb764c74783] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3436, si_uid=1000, si_status=0, si_utime=4, si_stime=4} ---
The child now gracefully shuts down, as it nicely detects the socket was closed.