Rserve icon indicating copy to clipboard operation
Rserve copied to clipboard

Closing the socket client-side results in SIGPIPE, R shutdown which removes the tempdir

Open rfaelens opened this issue 6 years ago • 6 comments

When I connect to RServe on Linux using 5 client-side connections And I close the 3rd one forcefully (socket.close() in Java) Then R receives SIGPIPE And the R tempdir is removed (R_CleanTempDir is called)

RServe should set its own signal handler for R_SIGPIPE, and fail gracefully.

Still trying to build a nice example, but it is difficult to pinpoint the exact cause of SIGPIPE.

rfaelens avatar Mar 08 '18 01:03 rfaelens

Based on strace of my application, this always happens in the same method:

15480 13:25:35 [00007f527a6d430d] sendto(4, "\x0a\x08\x01\x00\xa2\x04\x01\x00\x15\xc4\x00\x00\x22\x0c\x00\x00
\x74\x72\x79\x2d\x65\x72\x72\x6f\x72\x00\x01\x01\x13\x08\x00\x00"..., 268, 0, NULL, 0) = -1 EPIPE (Broken pip
e)
 > /usr/lib64/libc-2.17.so(__send+0x1d) [0xf930d]
 > /usr/lib64/R/bin/Rserve(server_send+0xe) [0x494e]
 > /usr/lib64/R/bin/Rserve(Rserve_QAP1_send_resp+0x9f) [0x4a4f]
 > /usr/lib64/R/bin/Rserve(Rserve_QAP1_connected+0xeb9) [0x7a29]
 > /usr/lib64/R/bin/Rserve(serverLoop+0x2bc) [0xa01c]
 > /usr/lib64/R/bin/Rserve(main+0x35b) [0x328b]
 > /usr/lib64/libc-2.17.so(__libc_start_main+0xf5) [0x21c05]
 > /usr/lib64/R/bin/Rserve(_start+0x29) [0x3d29]
15480 13:25:35 [00007f527a6d430d] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=15480, si_uid=1000}

rfaelens avatar Mar 08 '18 13:03 rfaelens

It's not possible to set the SIGPIPE handler because R itself resets it continuously so apps/packages cannot touch it.

There are two options

  1. use set.tempdir for unique temp dirs (useful in particular when you use user-switching - this is what we do in RCloud)
  2. use something like if (!dir.exists(tempdir())) dir.create(tempdir(),,TRUE) althgouh that's not 100% safe in multi-user environments (since you could have another process blow it away after you started running)

s-u avatar Mar 08 '18 15:03 s-u

I'll see if there is a way to insert a handler before R shutdown so that we can set the tempdir to /dev/null to avoid the deletion.

s-u avatar Mar 08 '18 15:03 s-u

Thanks for the comments. I do not understand how RServe can go from a SIGPIPE in the Rserve code, to R_CleanTempDir in the main loop of R. Is this libunwind/strace that makes an error, or did I fail to understand something?

3489  15:38:48 [00007fc9d186b37d] rt_sigaction(SIGPIPE, {sa_handler=0x7fc9d1d41cd0, sa_mask=[PIPE], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fc9d186b270}, {sa_handler=0x7fc9d1d41cd0, sa_mask=[PIPE], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fc9d186b270}, 8) = 0
 > /usr/lib64/libc-2.17.so(__GI___libc_sigaction+0xfd) [0x3537d]
 > /usr/lib64/libc-2.17.so(signal+0x66) [0x35186]
 > /usr/lib64/R/lib/libR.so(locale2charset+0x28c5) [0x148ce5]
 > /usr/lib64/libc-2.17.so(killpg+0x40) [0x35270]
 > /usr/lib64/libc-2.17.so(__send+0x1d) [0xf930d]
 > /usr/lib64/R/bin/Rserve(server_send+0xe) [0x494e]
 > /usr/lib64/R/bin/Rserve(Rserve_QAP1_send_resp+0x9f) [0x4a4f]
 > /usr/lib64/R/bin/Rserve(Rserve_QAP1_connected+0xeb9) [0x7a29]
 > /usr/lib64/R/bin/Rserve(serverLoop+0x2bc) [0xa01c]
 > /usr/lib64/R/bin/Rserve(main+0x35b) [0x328b]
 > /usr/lib64/libc-2.17.so(__libc_start_main+0xf5) [0x21c05]
 > /usr/lib64/R/bin/Rserve(_start+0x29) [0x3d29]
3489  15:38:48 [00007fc9d186b37d] rt_sigaction(SIGINT, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fc9d186b270}, {sa_handler=0x7fc9d1d41cb0, sa_mask=[INT], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fc9d186b270}, 8) = 0
 > /usr/lib64/libc-2.17.so(__GI___libc_sigaction+0xfd) [0x3537d]
 > /usr/lib64/libc-2.17.so(do_system+0x90) [0x41be0]
 > /usr/lib64/R/lib/libR.so(R_system+0x6) [0x1c0a96]
 > /usr/lib64/R/lib/libR.so(R_CleanTempDir+0x5a) [0x213e9a]
 > /usr/lib64/R/lib/libR.so(R_CleanTempDir+0xe5) [0x213f25]
 > /usr/lib64/R/lib/libR.so(setup_Rmainloop+0x5ec) [0x149d8c]
 > unexpected_backtracing_error [0x6]

rfaelens avatar Mar 08 '18 18:03 rfaelens

After testing: set.tempdir does not work.

R_CleanTempDir cleans the directory specified in Sys_TempDir. This is set in InitTempDir and is currently not modified by unixtools. See src/main/sysutils.c and src/unix/sys-std.c

The right way to solve this, in my humble opinion, is to set R_ignore_SIGPIPE on any internal code that is using send() and recvfrom(). See also src/main/main.c in the R source tree.

/* this flag is set if R internal code is using send() and does not
   want to trigger an error on SIGPIPE (e.g., the httpd code).
   [It is safer and more portable than other methods of handling
   broken pipes on send().]
 */

#ifndef Win32
// controlled by the internal http server in the internet module
int R_ignore_SIGPIPE = 0;

See also the example of the internal HTTP server within R src/modules/internet/Rhttpd.c. RServe should fall in the same class.

rfaelens avatar Mar 09 '18 08:03 rfaelens

I did some tests with the new code. When receiving a SIGPIPE now, the following happens:

3436  10:29:52 [00007fb764f56bad] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=3436, si_uid=1000} ---
3436  10:29:52 [00007fb764bba37d] rt_sigaction(SIGPIPE, {sa_handler=0x7fb7652accd0, sa_mask=[PIPE], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fb764bba270},  <unfinished ...>
3460  10:29:52 [00007fb764f56bad] sendto(4, "\x01\x00\x01\x00\x48\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00", 16, 0, NULL, 0 <unfinished ...>
3436  10:29:52 [00007fb764bba37d] <... rt_sigaction resumed> {sa_handler=0x7fb7652accd0, sa_mask=[PIPE], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fb764bba270}, 8) = 0
 > /usr/lib64/libc-2.17.so(__GI___libc_sigaction+0xfd) [0x3537d]
 > /usr/lib64/libc-2.17.so(signal+0x66) [0x35186]
 > /usr/lib64/R/lib/libR.so(locale2charset+0x28c5) [0x148ce5]
 > /usr/lib64/libc-2.17.so(killpg+0x40) [0x35270]
 > /usr/lib64/libpthread-2.17.so(send+0x1d) [0xebad]
 > /usr/lib64/R/bin/Rserve(server_send+0x18) [0x55a8]
 > /usr/lib64/R/bin/Rserve(Rserve_QAP1_send_resp+0xa0) [0x54d0]
 > /usr/lib64/R/bin/Rserve(Rserve_QAP1_connected+0xda0) [0xc7e0]
 > /usr/lib64/R/bin/Rserve(serverLoop+0x262) [0xe242]
 > /usr/lib64/R/bin/Rserve(main+0x359) [0x47c9]
 > /usr/lib64/libc-2.17.so(__libc_start_main+0xf5) [0x21c05]
 > /usr/lib64/R/bin/Rserve(_start+0x29) [0x5367]
3436  10:29:52 [00007fb764bba279] rt_sigreturn({mask=[]} <unfinished ...>
3436  10:29:52 [00007fb764f56bad] <... rt_sigreturn resumed> ) = -1 EPIPE (Broken pipe)
 > /usr/lib64/libpthread-2.17.so(send+0x1d) [0xebad]
 > /usr/lib64/R/bin/Rserve(server_send+0x18) [0x55a8]
 > /usr/lib64/R/bin/Rserve(Rserve_QAP1_send_resp+0xa0) [0x54d0]
 > /usr/lib64/R/bin/Rserve(Rserve_QAP1_connected+0xda0) [0xc7e0]
 > /usr/lib64/R/bin/Rserve(serverLoop+0x262) [0xe242]
 > /usr/lib64/R/bin/Rserve(main+0x359) [0x47c9]
 > /usr/lib64/libc-2.17.so(__libc_start_main+0xf5) [0x21c05]
 > /usr/lib64/R/bin/Rserve(_start+0x29) [0x5367]
3436  10:29:52 [00007fb764f56a3d] recvfrom(4,  <unfinished ...>
3436  10:29:52 [00007fb764f56a3d] <... recvfrom resumed> "", 16, 0, NULL, NULL) = 0
 > /usr/lib64/libpthread-2.17.so(recv+0x1d) [0xea3d]
 > /usr/lib64/R/bin/Rserve(server_recv+0x18) [0x55c8]
 > /usr/lib64/R/bin/Rserve(Rserve_QAP1_connected+0x170) [0xbbb0]
 > /usr/lib64/R/bin/Rserve(serverLoop+0x262) [0xe242]
 > /usr/lib64/R/bin/Rserve(main+0x359) [0x47c9]
 > /usr/lib64/libc-2.17.so(__libc_start_main+0xf5) [0x21c05]
 > /usr/lib64/R/bin/Rserve(_start+0x29) [0x5367]
3436  10:29:52 [????????????????] +++ exited with 0 +++
1238  10:29:52 [00007fb764c74783] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3436, si_uid=1000, si_status=0, si_utime=4, si_stime=4} ---

The child now gracefully shuts down, as it nicely detects the socket was closed.

rfaelens avatar Mar 09 '18 09:03 rfaelens