McCode
McCode copied to clipboard
Limit on total number of McStas monitors (segfault)
There seems to be an upper limit on the number of monitors allowed in a single instrument, which is much lower than the total component limit. On McStas 2.5 I found it to be a maximum of 377, with anything above producing a segfault at runtime. I have attached an instrument that use 378 PSD_monitors which fails in this manner (txt ending as GitHub didn't want instr).
I have tried combinations of different monitors too, as long as the total reaches more than 377 the segfault occurs. The number of histogram bins in each monitor does not impact the total allowed number of monitors.
Thanks for the bug report @mads-bertelsen. As discussed elsewhere already, I believe this may be system dependent. I have successfully fun the instrument on macOS 11 Big Sur with McStas 2.5, 2.6.1, 2.7 and 3.0.
My compiler is
Apple clang version 12.0.0 (clang-1200.0.32.28)
Target: x86_64-apple-darwin20.2.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
I will be back with more observations later :-)
Also works OK on Catalina with 2.5-generated code and
Apple clang version 12.0.0 (clang-1200.0.32.28)
Target: x86_64-apple-darwin19.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
I have installed myself a fresh macOS 11.13 High Sierra VM, and loaded up a McStas 2.5. On this system I have
I can also not reproduce the issue here??
It does however seem that my compiler is slightly different still from what @mads-bertelsen initially reported:
Apple LLVM version 10.0.0 (clang-1000.11.45.5)
Target: x86_64-apple-darwin17.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
which seems to be the "full" Xcode rather than the slim "command line tools". Will now try loading a full Xcode next
Ah! Now reproduced without updating Xcode - with:
mcstas n_component.txt
clang -lm n_component.txt.c
./a.out
The default McStas CFLAGS include compiler optimisation to level -O2 which is what I tested with before - and where the segfault did NOT occur above... Revisiting Catalina and Big Sur...
OK, now also reproduced on Big Sur for the 2.x releases, 3.0 does not behave this way:
- and the moment one uses the -O2 option used by default in McStas the issue goes away...
Interesting, sorry I didn't mention I was running mcstas without the -O2 flag, was purely because of the long compile times for testing. Surprising that -O2 somehow fixes the problem, would think -O2 introduced risk of issues rather than fixing them.
@mads-bertelsen agreed, this is a bit surprising. But maybe (by sheer luck) -O2 clears out e.g. some unused vars/arrays that are part of the equation?
I tried running on Linux with gcc, no issue there - so looks quite platform dependent. @ebknudsen is quite good at debugging with gdb, but I believe this is not available on Mac, so to really figure out what is going on I guess we really ought to learn ourselves some https://lldb.llvm.org...
Or maybe some Valgrind could hint in the direction of some issues? On the other hand, that is mostly for dynamic arrays etc., right? (Which maybe are not really the be the issue here: McStas 2.5 has static arrays in the PSD's, 2.6.1/2.7 uses dynamic ones...)
Since it only happens in monitors, it is clear DETECTOR_OUT macros are involved. Removing these from the SAVE section fixes the problem. Digging into these in mccode-r.c I found that specifically it is the call to
detector = mcdetector_statistics(detector);
Commenting this line out of the mcdetector_import function fixes the issue. The strange thing is that if I leave it in, but return from that function immediately using
return(detector);
The segfault still happens. It should bypass the function completely. I can't even have that function print a single line, so the error somehow happens in the call of the function itself rather than in the function.
EDIT: It seems earlier printf statements are also not printed when a segfault occurs, so maybe its not directly in the call. The failure is connected to mcdetector_statistics though, since it works without it.
on a side note: you can't always trust printf's to be echoed to the screen in case of segfaults, as they are buffered, and the program may exit before the buffer is flushed.
I have found a surprising fix by modifying mccode-r.c
EDIT: This fix only extend the limit with a few extra monitors, not enough for our use-case. It may serve as a indicator of what the problem is.
The function mcdetector_statistics takes a MCDETECTOR and returns an MCDETECTOR, but its the call with an MCDETECTOR input at that stage which causes the crash. Behind the senses a copy operation happens. For some reason that fails. I changed the function to take a pointer, and declare a new MCDETECTOR by starting the function like this:
MCDETECTOR mcdetector_statistics(
MCDETECTOR *p_detector)
{
MCDETECTOR detector;
detector = *p_detector;
And changed the call to
detector = mcdetector_statistics(&detector);
Now everything seems to work normally. Have attached the new mccode-r.c. I am unsure if this is the appropriate way to fix this issue, so haven't started a branch with a fix. mccode-r.txt
It may be bad practice to write all these mcdetector functions to take this large MCDETECTOR struct and return it, each time making a big unnecessary copy operation instead of just providing pointers to the existing MCDETECTOR struct.
Continued debugging of the mcdetector_import makes it clear something strange is going on. At high monitor counts, these innocent looking lines with compatible types (long in struct, long given to function) is causing the crash:
detector.n = n;
detector.p = p;
detector.m = m;
By printing values for detector.n, detector.p and detector.m, it became clear that before the above code that sets them for this detector, they are somehow initialized to the values of the previous detector instance. I don't understand how that can be. A new struct is declared with static variables, and some of these already have values that were set by a previous function call that declared another version of that struct.
EDIT: Here I comment all parts of mcdetector_import after the above lines but excluding the last return statement to understand exactly where the crash occurs.
Just to be clear - is this before or after the above fix to mccode-r.c was applied?
In the above I didn't even run the mcdetector_statistics function, at high enough monitor counts the crash happens without it. Added an edit showing that I commented out the remainder of the code in the function, including that call.