cosmopolitan icon indicating copy to clipboard operation
cosmopolitan copied to clipboard

Let signals interrupt fgets unless SA_RESTART set

Open cloudrac3r opened this issue 2 months ago • 5 comments

See investigation in #1130.

fgets internally calls readv. readv is a @restartable function that understands SA_RESTART. If SA_RESTART is set, readv already handles restarting the system call and eventually the string is transparently returned to the fgets caller.

When SA_RESTART is not set, -1 EINTR should bubble up to the fgets caller. However, until this commit, fgets itself would detect EINTR and keep retrying until it read an entire line.

This commit fixes this behaviour so that fgets understands SA_RESTART.

cloudrac3r avatar Apr 23 '24 02:04 cloudrac3r

Thank you for the thorough investigation and fix.

To ensure SA_RESTART is working as expected / the EINTR hack is not needed, could you please write some tests? I was thinking something to the effect of having a thread run fgets in a loop and having another thread raise signals and verifying the contents are read as expected (with SA_RESTART), and EINTR is returned sometimes otherwise. I realize this isn't the easiest thing to test, but suspect the current behavior is there for a reason.

I'm not sure if waiving copyright/putting it in public domain is enough to merge this. If you don't have any objections could you assign copyright to Justine as described here: https://github.com/jart/cosmopolitan/blob/master/CONTRIBUTING.md#copyright-assignment ? Also please note it here so I can confirm.

G4Vi avatar Apr 24 '24 02:04 G4Vi

I have now done the copyright assignment process, and have checked in my test file with a copyright notice under my own name as encouraged.

I haven't written C since university so I hope my test file is doing the right thing. I will welcome any suggestions.

I did clang-format and valgrind, both tools are happy with my code. I also made sure that the test only passes when fgets_unlocked.c has the fix. If you undo the fix in fgets_unlocked.c, you'll see the test fail.

I was not able to figure out how to run the entire test suite and couldn't find any instructions. The output of make test looked like I need to have 7 computers with the correct operating systems already installed, which I'm not able to do. In theory my change should be good across platforms, so fingers crossed.

cloudrac3r avatar Apr 26 '24 01:04 cloudrac3r

You can run the test suite just by making items under o/$MODE/test for example make MODE= o//test/libc/stdio builds and runs the stdio tests or make MODE= o//test/libc/stdio/fgets_interrupt_test.runs builds and runs just your new test.

Unfortunately it looks like your test needs tweaks as it fails on my Linux system and on the CI runs above:

sample@frontierjustice:~/repos/cosmopolitan$ make MODE= o//test/libc/stdio
error:test/libc/stdio/fgets_interrupt_test.c:104: assertStringNotEquals() in fgets_eintr_testThatTheSignalInterruptsFgets() on frontierjustice pid 32939 tid 32939
        buf
                need "Hello world!" =
                 got "Hello world!"
        No error information
        o//test/libc/stdio/fgets_interrupt_test
error:test/libc/stdio/fgets_interrupt_test.c:105: fgets_eintr_testThatTheSignalInterruptsFgets() on frontierjustice pid 32939 tid 32939
        EXPECT_EQ(EINTR, errno)
                need 4 (or 0x04 or '\4' or EINTR) =
                 got 0 (or 0x0 or '\0')
        No error information
        o//test/libc/stdio/fgets_interrupt_test @ frontierjustice
2 / 9 tests failed

`make MODE= -j16 o//test/libc/stdio/fgets_interrupt_test.runs` exited with 2:
o//test/libc/stdio/fgets_interrupt_test
consumed 817µs wall time
ballooned to 596kb in size
needed 675us cpu (0% kernel)
caused 82 page faults (100% memcpy)
4 context switches (100% consensual)

make: *** [build/rules.mk:133: o//test/libc/stdio/fgets_interrupt_test.runs] Error 2

G4Vi avatar Apr 26 '24 06:04 G4Vi

Looks like the o//test/libc/stdio/fgets_interrupt_test executable succeeds on my build server but fails on my laptop. Seems like adding --strace on my laptop can sometimes cause it to succeed. I think there might be a race condition in the test. I'll see if I can figure it out.

cloudrac3r avatar Apr 26 '24 14:04 cloudrac3r

I couldn't figure out what was causing the race condition. Setting sched_setaffinity to a single core is the only way I could find to fix it reliably. Sorry for resorting to an inelegant hack.

cloudrac3r avatar Apr 27 '24 14:04 cloudrac3r