root icon indicating copy to clipboard operation
root copied to clipboard

Option to reduce startup syscalls via environment caching

Open sawenzel opened this issue 1 year ago • 19 comments

This commit provides the possibility to pass system library search paths as well as some compiler include paths to ROOT as environment variables. This has the advantage that ROOT will spawn less sub-processes and we can do the setup only once, instead of doing it for every single executable that is linked to ROOT.

The commit does not change any default behaviour! Rather, expert-users may use the new feature by moving the initialization of the search paths to say software environment loading.

In ALICE, we do something like

export ROOT_LDSYSPATH=$(LD_DEBUG=libs LD_PRELOAD=DOESNOTEXIST ls /tmp/DOESNOTEXIST 2>&1 | grep -m 1 "system search path" | sed 's/.*=//g' | awk '//{print $1}')

export ROOT_CPPSYSINCL=$(LC_ALL=C c++ -xc++ -E -v /dev/null 2>&1 | sed -n '/^.include/,${/^ \/.*++/{p}}' | tr '\n' ':' | tr ' ' ':')

speeding up the initialization of our executables at runtime and doing less syscalls that create short-lived processes, for instance calling the compiler.

The effect from this operation can be seen by counting the execve syscalls in a small example:

strace -e execve -f root.exe -q -e "double x=1;"  # ---> 14 calls

export ROOT_LDSYSPATH=...
export ROOT_CPPSYSINCL=...
strace -e execve -f root.exe -q -e "double x=1;"  # ---> 6 calls

This gain can accumulate to significant savings when used in a multi-process environment such as ALICE is using.

sawenzel avatar Apr 04 '23 15:04 sawenzel

Can one of the admins verify this patch?

phsft-bot avatar Apr 04 '23 15:04 phsft-bot

So far we had this patch in our local fork but we would prefer to have it integrated into the mainstream repository.

sawenzel avatar Apr 04 '23 15:04 sawenzel

Thanks, @sawenzel !

Would CPLUS_INCLUDE_PATH (see clang docs) and LIBRARY_PATH do the same job?

Axel-Naumann avatar Apr 04 '23 15:04 Axel-Naumann

This gain can accumulate to significant savings

Can you give us an idea of the costs of the syscalls?

Axel-Naumann avatar Apr 04 '23 15:04 Axel-Naumann

@phsft-bot build

Axel-Naumann avatar Apr 04 '23 15:04 Axel-Naumann

Starting build on ROOT-debian10-i386/soversion, ROOT-performance-centos8-multicore/cxx17, ROOT-ubuntu18.04/nortcxxmod, ROOT-ubuntu2004/python3, mac12/noimt, mac11/cxx14, windows10/cxx14 How to customize builds

phsft-bot avatar Apr 04 '23 15:04 phsft-bot

LIBRARY_PATH

They don't. Just tested it. One of the reasons is certainly that TUnixSystem is not part of Clang.

sawenzel avatar Apr 04 '23 15:04 sawenzel

We have significant improvements thanks to the changes of Sandro. You can see what was the problem and the effect of the modifications in slides 3-8 from the presentation of Marta: https://indico.cern.ch/event/877541/contributions/3697544/attachments/1967633/4324429/PayloadControl-T1T2Workshop.pptx.pdf

pzhristov avatar Apr 04 '23 15:04 pzhristov

Just to add, I think the actual timing gain depends heavily on the use-case and software environment. In the end, for us, this is not even the decisive factor. What is more important is the reduction of noise processes when we do profiling/monitoring and a less heavy burden on process management in the /proc space. Our data taking workflows have O(100) of "microservices" that link to ROOT. Just starting up this system created O(1000) calls to "c++, ... " to find out some paths. This problem is now essentially gone.

sawenzel avatar Apr 04 '23 16:04 sawenzel

By any means I am not against this change. However, since such overhead is a significant problem for your O(N) microservices and saving a few calls makes a difference, have you considered using checkpointing to cache ROOT startup operations altogether? A very simplistic way to achieve the checkpointing is to implement a core dump. That would probably yield better results AFAICT.

vgvassilev avatar Apr 04 '23 16:04 vgvassilev

Test Results

         5 files           5 suites   1d 3h 9m 5s :stopwatch:   2 420 tests   2 412 :heavy_check_mark: 0 :zzz:   8 :x: 11 942 runs  11 919 :heavy_check_mark: 0 :zzz: 23 :x:

For more details on these failures, see this check.

Results for commit 7475b881.

github-actions[bot] avatar Apr 04 '23 18:04 github-actions[bot]

@vgvassilev : Your idea sounds nice. Implementation of such a system is, however, not realistic for us at the moment and it would also go far far beyond the simple step here.

Of course, we'd be happy to address smaller comments (naming of things, etc.) if there are any.

sawenzel avatar Apr 19 '23 13:04 sawenzel

This is indeed a good improvement. A few comments/opinions:

  • The variable are used within Cling and thus should probably be prefixed by CLING_ rather than ROOT_
  • The setting used/necessary is complex and maybe we ought to provide a simpler mechanism (some thing similar to
export CLING_LDSYSPATH=`cling print_ld_syspath();`
  • The (pre-existing) duplication of the code/feature dealing with ldsyspath is less than optimal.

pcanal avatar Apr 19 '23 15:04 pcanal

@pcanal I don't understand

(some thing similar to export CLING_LDSYSPATH=cling print_ld_syspath();`

Could you edit your comment to fix this or elaborate?

Axel-Naumann avatar Apr 26 '23 06:04 Axel-Naumann

(some thing similar to export CLING_LDSYSPATH=cling print_ld_syspath();`

I actually meant: CLING_LDSYSPATH=`cling print_ld_syspath();` (missing the execution request). And would require the cling runtime to have access to a function named print_ld_syspath which you actually print the value of LDSYSPATH that cling calculates and uses when CLING_LDSYSPATH is not set.

pcanal avatar Apr 26 '23 14:04 pcanal

How can we make some progress here? It looks as if @pcanal suggestions would be future improvements (apart from renaming few variables).

sawenzel avatar Aug 15 '23 07:08 sawenzel

A couple concern.

export ROOT_LDSYSPATH=...
export ROOT_CPPSYSINCL=...
root.exe -q -e "double x=1;"

works perfectly but

export ROOT_LDSYSPATH=...
export ROOT_CPPSYSINCL=...
PATH=/location_of_another_gcc/bin:${PATH}
root.exe -q -e "double x=1;"

fails.

Also the list and calculation for this type of information might change overtime and so the 'hardcoded' information in your bash script might fail.

What if we had a wrapper binary that would set some (private) variables and send fork the main processes (i.e. the one that will eventually launch all your production processes)? I.e. instead of

... setup environment ...
launch_O2 some args # this is the process that eventually fork all the tasks

have:

... setup environment ...
preset_expensive_root_vars launch_O2 some args # this is the process that eventually fork all the tasks

where preset_expensive_root_vars is provided by us.

pcanal avatar Aug 16 '23 16:08 pcanal

@pcanal : I agree that there is a limited danger of variable inconsistency. However, in ALICE, we are already doing it in a way similar to what you suggest: the ROOT variables are dynamically calculated and preset (before ROOT is launched) as part of the software stack initialization. There are no hard-coded things (and no one is supposed to change compiler paths on the GRID).

Since we launch many ALICE-O2 executables, the preset actually needs to be done on a global level instead for every executable (otherwise we would not gain anything).

sawenzel avatar Aug 17 '23 08:08 sawenzel

the preset actually needs to be done on a global level

The proposal was to wrap this global level (or an inner part there of). i.e. where as you currently have something 'like':

part1();
export ROOT_SETUP_VAR=`eval something()`
part2();

you could do:

part1();
root_setup_wrapper part2();  

where the line root_setup_wrapper is designed to have the same net result of the last 2 lines of the original.

pcanal avatar Aug 21 '23 22:08 pcanal