root
root copied to clipboard
Option to reduce startup syscalls via environment caching
This commit provides the possibility to pass system library search paths as well as some compiler include paths to ROOT as environment variables. This has the advantage that ROOT will spawn less sub-processes and we can do the setup only once, instead of doing it for every single executable that is linked to ROOT.
The commit does not change any default behaviour! Rather, expert-users may use the new feature by moving the initialization of the search paths to say software environment loading.
In ALICE, we do something like
export ROOT_LDSYSPATH=$(LD_DEBUG=libs LD_PRELOAD=DOESNOTEXIST ls /tmp/DOESNOTEXIST 2>&1 | grep -m 1 "system search path" | sed 's/.*=//g' | awk '//{print $1}')
export ROOT_CPPSYSINCL=$(LC_ALL=C c++ -xc++ -E -v /dev/null 2>&1 | sed -n '/^.include/,${/^ \/.*++/{p}}' | tr '\n' ':' | tr ' ' ':')
speeding up the initialization of our executables at runtime and doing less syscalls that create short-lived processes, for instance calling the compiler.
The effect from this operation can be seen by counting the execve
syscalls in a small example:
strace -e execve -f root.exe -q -e "double x=1;" # ---> 14 calls
export ROOT_LDSYSPATH=...
export ROOT_CPPSYSINCL=...
strace -e execve -f root.exe -q -e "double x=1;" # ---> 6 calls
This gain can accumulate to significant savings when used in a multi-process environment such as ALICE is using.
Can one of the admins verify this patch?
So far we had this patch in our local fork but we would prefer to have it integrated into the mainstream repository.
This gain can accumulate to significant savings
Can you give us an idea of the costs of the syscalls?
@phsft-bot build
Starting build on ROOT-debian10-i386
/soversion
, ROOT-performance-centos8-multicore
/cxx17
, ROOT-ubuntu18.04
/nortcxxmod
, ROOT-ubuntu2004
/python3
, mac12
/noimt
, mac11
/cxx14
, windows10
/cxx14
How to customize builds
LIBRARY_PATH
They don't. Just tested it. One of the reasons is certainly that TUnixSystem is not part of Clang.
We have significant improvements thanks to the changes of Sandro. You can see what was the problem and the effect of the modifications in slides 3-8 from the presentation of Marta: https://indico.cern.ch/event/877541/contributions/3697544/attachments/1967633/4324429/PayloadControl-T1T2Workshop.pptx.pdf
Just to add, I think the actual timing gain depends heavily on the use-case and software environment. In the end, for us, this is not even the decisive factor. What is more important is the reduction of noise processes when we do profiling/monitoring and a less heavy burden on process management in the /proc space. Our data taking workflows have O(100) of "microservices" that link to ROOT. Just starting up this system created O(1000) calls to "c++, ... " to find out some paths. This problem is now essentially gone.
By any means I am not against this change. However, since such overhead is a significant problem for your O(N) microservices and saving a few calls makes a difference, have you considered using checkpointing to cache ROOT startup operations altogether? A very simplistic way to achieve the checkpointing is to implement a core dump. That would probably yield better results AFAICT.
Test Results
5 files 5 suites 1d 3h 9m 5s :stopwatch: 2 420 tests 2 412 :heavy_check_mark: 0 :zzz: 8 :x: 11 942 runs 11 919 :heavy_check_mark: 0 :zzz: 23 :x:
For more details on these failures, see this check.
Results for commit 7475b881.
@vgvassilev : Your idea sounds nice. Implementation of such a system is, however, not realistic for us at the moment and it would also go far far beyond the simple step here.
Of course, we'd be happy to address smaller comments (naming of things, etc.) if there are any.
This is indeed a good improvement. A few comments/opinions:
- The variable are used within
Cling
and thus should probably be prefixed byCLING_
rather thanROOT_
- The setting used/necessary is complex and maybe we ought to provide a simpler mechanism (some thing similar to
export CLING_LDSYSPATH=`cling print_ld_syspath();`
- The (pre-existing) duplication of the code/feature dealing with
ldsyspath
is less than optimal.
@pcanal I don't understand
(some thing similar to export CLING_LDSYSPATH=cling print_ld_syspath();`
Could you edit your comment to fix this or elaborate?
(some thing similar to export CLING_LDSYSPATH=cling print_ld_syspath();`
I actually meant: CLING_LDSYSPATH=`cling print_ld_syspath();`
(missing the execution request).
And would require the cling runtime to have access to a function named print_ld_syspath
which you actually print the value of LDSYSPATH
that cling calculates and uses when CLING_LDSYSPATH
is not set.
How can we make some progress here? It looks as if @pcanal suggestions would be future improvements (apart from renaming few variables).
A couple concern.
export ROOT_LDSYSPATH=...
export ROOT_CPPSYSINCL=...
root.exe -q -e "double x=1;"
works perfectly but
export ROOT_LDSYSPATH=...
export ROOT_CPPSYSINCL=...
PATH=/location_of_another_gcc/bin:${PATH}
root.exe -q -e "double x=1;"
fails.
Also the list and calculation for this type of information might change overtime and so the 'hardcoded' information in your bash script might fail.
What if we had a wrapper binary that would set some (private) variables and send fork the main processes (i.e. the one that will eventually launch all your production processes)? I.e. instead of
... setup environment ...
launch_O2 some args # this is the process that eventually fork all the tasks
have:
... setup environment ...
preset_expensive_root_vars launch_O2 some args # this is the process that eventually fork all the tasks
where preset_expensive_root_vars
is provided by us.
@pcanal : I agree that there is a limited danger of variable inconsistency. However, in ALICE, we are already doing it in a way similar to what you suggest: the ROOT variables are dynamically calculated and preset (before ROOT is launched) as part of the software stack initialization. There are no hard-coded things (and no one is supposed to change compiler paths on the GRID).
Since we launch many ALICE-O2 executables, the preset actually needs to be done on a global level instead for every executable (otherwise we would not gain anything).
the preset actually needs to be done on a global level
The proposal was to wrap this global level (or an inner part there of). i.e. where as you currently have something 'like':
part1();
export ROOT_SETUP_VAR=`eval something()`
part2();
you could do:
part1();
root_setup_wrapper part2();
where the line root_setup_wrapper
is designed to have the same net result of the last 2 lines of the original.