Close more FDs because XRootd plays games in library init
Also fix an oversight in EVPath remote
@abh3 Do you have any idea what is goin on here?
Not at all and certainly xrootd is not at fault here. The linker obviously is going to open loads of FD's to service library dependencies during library initialization. That is completely normal and I don't see why it should interfere with CTest. Anyway, closing arbitrary FD's without knowing why they were opened is surely going to cause more problems than a CTest failure.
Not at all and certainly xrootd is not at fault here. The linker obviously is going to open loads of FD's to service library dependencies during library initialization. That is completely normal and I don't see why it should interfere with CTest. Anyway, closing arbitrary FD's without knowing why they were opened is surely going to cause more problems than a CTest failure.
I'll get you more info when I can. This is a program (an ADIOS data server) that uses ADIOS, but not doesn't call any Xrootd client routines, but it behaves differently when ADIOS is linked with those Xrootd routines available than when it is not (even when not called). Closing some stdout and stderr are necessary when the server backgrounds itself when running as a fixture under CTest. If the server doesn't do that CTest thinks the subcommand hasn't finished. Somehow more get opened when linked with xrootd and closing them resolves a problem. I need to see if I can do some tracing to see exactly when those get opened. (I'm also trying to sort what libraries we're really pulling in, if those are just client libraries, etc. Also need to have cmake find the xrootd executable so we can test what we're doing in our own CI, but having linking with xrootd not break the existing setup is the first step.
So, I added a couple of lines to our (non-xrootd) remote data server right after main() like this:
int main(int argc, char **argv)
{
. . .
int nextfd = open("/dev/tty", O_WRONLY);
printf("Next available fd after main was %d\n", nextfd);
When this program is run without xrootd linked in to ADIOS, the output is as one might expect:
eisen@Endor build % bin/adios2_remote_server
Next available fd after main was 3
I.E. stdin, stdout and stderr were open upon entry to main, so the next available FD was 3. But when ADIOS is linked with xrootd, this is the output:
eisen@Endor build % bin/adios2_remote_server
Next available fd after main was 13
I suspected that something might be going on in library initialization and in particular with logging, so I ran this in a debugger with breakpoints set at likely candidates (open(), dup(), dup2(), etc.). This led me to:
XrdSysFD.hh:
92 inline int XrdSysFD_Dup(int oldfd)
93 {int newfd = dup(oldfd);
94 if (newfd >= 0) fcntl(newfd, F_SETFD, FD_CLOEXEC);
95 return newfd;
96 }
This is called 7 times, each time during the constructor of a XrdSysLogger object during xrootd library initialization and each time calling dup on stderr. (There are a couple of other opens during network initialization that account for some of the other FDs.).
So we have a lot of instances where stderr is open on an FD other than 2. Normally this isn't a problem, but it caused an issue because of the particular way we ran our adios2_remote_server as a ctest fixture in our test suite. Fixtures need a "setup" command, which is run like a normal test, but which in this case must startup a server that will run in the background (and will exit when the "shutdown" command is run). For adios2_remote_server, we had a -background flag we used here that simply had it fork() itself, the child continued in the background while the parent exited so that the "setup" test would exit. However because ctest is trying to read the stdout and stderr or all its tests, it does more than just look for the test process exiting, but also looks to get an EOF on the stdout and stderr of that test. Because those FDs are inherited by the child of the test, CTest doesn't get an EOF on them unless the child closes them too. That's why we were originally closing FDs 0,1, and 2. What xrootd libraries dup'ing stderr upon initialization changed for us is now stderr has to be closed a lot more places before CTest gets that EOF.
One thing to keep in mind is that this process (and ADIOS in this circumstance) has not and will not call any xrootd subroutines. So while I'm certain that us closing these FDs would likely mess up XrdSysLogger, that's not a particular concern for us because it'll never be active anyway. It's just that the extra 7 dup's of stderr and their interaction with ctest were unexpected.
That said, I might have to change how we start our servers in CI anyway. I'd like to be checking adios xrootd functionality in CI, which means starting up an xrootd server and shutting it down on demand. It looks like there's a -b argument for xrootd, and a mechanism for putting the server PID in a file (which will help with shutdown, maybe with -HUP?). But I don't know yet if we'd face the same stderr issues we ran into with our remote server, and if we do I can't modify xrootd to fix it in quite the same way.
If at all possible, it would be helpful that at each breakpoint you could include a "where", so we know where this call is actually coming from. Clearly, we did not expect that many "dups" to occur, so it is mystifying. Additionally, review of the code indicates that none of those could occur from a static initialization. So, we are doubly mystified.
Andy
From: Greg Eisenhauer @.> Sent: Saturday, May 25, 2024 10:03 AM To: ornladios/ADIOS2 @.> Cc: Andrew Hanushevsky @.>; Mention @.> Subject: Re: [ornladios/ADIOS2] Close more FDs because XRootd plays games in library init (PR #4178)
So, I added a couple of lines to our (non-xrootd) remote data server right after main() like this:
int main(int argc, char **argv) { . . . int nextfd = open("/dev/tty", O_WRONLY); printf("Next available fd after main was %d\n", nextfd);
When this program is run without xrootd linked in to ADIOS, the output is as one might expect:
@.*** build % bin/adios2_remote_server Next available fd after main was 3
I.E. stdin, stdout and stderr were open upon entry to main, so the next available FD was 3. But when ADIOS is linked with xrootd, this is the output:
@.*** build % bin/adios2_remote_server Next available fd after main was 13
I suspected that something might be going on in library initialization and in particular with logging, so I ran this in a debugger with breakpoints set at likely candidates (open(), dup(), dup2(), etc.). This led me to:
XrdSysFD.hh: 92 inline int XrdSysFD_Dup(int oldfd) 93 {int newfd = dup(oldfd); 94 if (newfd >= 0) fcntl(newfd, F_SETFD, FD_CLOEXEC); 95 return newfd; 96 }
This is called 7 times, each time during the constructor of a XrdSysLogger object during xrootd library initialization and each time calling dup on stderr. (There are a couple of other opens during network initialization that account for some of the other FDs.).
So we have a lot of instances where stderr is open on an FD other than 2. Normally this isn't a problem, but it caused an issue because of the particular way we ran our adios2_remote_server as a ctest fixture in our test suite. Fixtures need a "setup" command, which is run like a normal test, but which in this case must startup a server that will run in the background (and will exit when the "shutdown" command is run). For adios2_remote_server, we had a -background flag we used here that simply had it fork() itself, the child continued in the background while the parent exited so that the "setup" test would exit. However because ctest is trying to read the stdout and stderr or all its tests, it does more than just look for the test process exiting, but also looks to get an EOF on the stdout and stderr of that test. Because those FDs are inherited by the child of the test, CTest doesn't get an EOF on them unless the child closes them too. That's why we were originally closing FDs 0,1, and 2. What xrootd libraries dup'ing stderr upon initialization changed for us is now stderr has to be closed a lot more places before CTest gets that EOF.
One thing to keep in mind is that this process (and ADIOS in this circumstance) has not and will not call any xrootd subroutines. So while I'm certain that us closing these FDs would likely mess up XrdSysLogger, that's not a particular concern for us because it'll never be active anyway. It's just that the extra 7 dup's of stderr and their interaction with ctest were unexpected.
That said, I might have to change how we start our servers in CI anyway. I'd like to be checking adios xrootd functionality in CI, which means starting up an xrootd server and shutting it down on demand. It looks like there's a -b argument for xrootd, and a mechanism for putting the server PID in a file (which will help with shutdown, maybe with -HUP?). But I don't know yet if we'd face the same stderr issues we ran into with our remote server, and if we do I can't modify xrootd to fix it in quite the same way.
— Reply to this email directly, view it on GitHubhttps://github.com/ornladios/ADIOS2/pull/4178#issuecomment-2131362327, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAUIW5ZN74C6U22GZIUVO6TZEC76ZAVCNFSM6AAAAABIIIE7I6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZRGM3DEMZSG4. You are receiving this because you were mentioned.Message ID: @.***>
Not a problem, though at the moment I've been working in an OSX development environment, so it's an lldb log. I saw the same CTest-hanging-on-server-startup behavior on Ubuntu, and will get you a log from there when I get a chance as well.
lldb_log.txt
Looking at this a little bit more, I'd be suspicious of the static XrdSysLogger entries in various cpp files.
XrdCryptoAux.cpp:
// For error logging and tracing
static XrdSysLogger Logger;
The constructor is going to cause the dup...
Good point! Old code from the dim dark past, sigh. I'll see what we can do about it.
Andy
From: Greg Eisenhauer @.> Sent: Sunday, May 26, 2024 5:24 AM To: ornladios/ADIOS2 @.> Cc: Andrew Hanushevsky @.>; Mention @.> Subject: Re: [ornladios/ADIOS2] Close more FDs because XRootd plays games in library init (PR #4178)
Looking at this a little bit more, I'd be suspicious of the static XrdSysLogger entries in various cpp files.
XrdCryptoAux.cpp:
// For error logging and tracing static XrdSysLogger Logger;
The constructor is going to cause the dup...
— Reply to this email directly, view it on GitHubhttps://github.com/ornladios/ADIOS2/pull/4178#issuecomment-2132201169, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAUIW52LH6JEK74ZUZL72FDZEHIBBAVCNFSM6AAAAABIIIE7I6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZSGIYDCMJWHE. You are receiving this because you were mentioned.Message ID: @.***>
OK. I'll adjust this PR a bit, but keep the basics. Whatever you change, we'll still have to deal with existing XRootD releases.