reframe
reframe copied to clipboard
Strange behaviour when using conditional dependency
I have a test with an optional dependency, see below. I use the Blender_CompileShaders
test to force a one-time action (when the NVIDIA driver has changed) of compiling NVIDIA shaders before rendering, which can take quite some time, and I don't want that time to pollute the actual render results of Blender_RIOW
. But I do want to keep track of the precompile time, hence having it as a separate test that gets logged.
@rfm.simple_test
class Blender_CompileShaders(rfm.RunOnlyRegressionTest):
descr = 'Force Blender CUDA shader compilation'
valid_systems = ['snellius:gpu_a100', 'snellius:gpu_h100']
...
class BlenderTestBase(rfm.RunOnlyRegressionTest):
descr = 'Blender %s render benchmark' % BLENDER_VERSION
valid_systems = [
'snellius:rome', 'snellius:genoa', 'snellius:fat', 'snellius:gpu_a100', 'snellius:gpu_h100', 'snellius:himem_4tb', 'snellius:himem_8tb'
]
...
def dep_gpu_only(src, dst):
print(src, dst, dst[0].startswith('gpu_'))
return dst[0].startswith('gpu_')
@rfm.simple_test
class Blender_RIOW(BlenderTestBase):
descr = 'Blender render benchmark'
@run_after('init')
def inject_dependencies(self):
self.depends_on('Blender_CompileShaders', how=dep_gpu_only)
....
The funky thing here is that the Blender_RIOW
test is run on all of our nodes, including non-GPU ones, while the Blender_CompileShaders
dependency only makes sense on GPU nodes. Hence the valid_systems = ['snellius:gpu_a100', 'snellius:gpu_h100']
in that class.
However, this seems to trip up Reframe somewhat. When I run the test on a GPU node all is well and I can see the dep_gpu_only()
call being made and returning True
:
snellius paulm@int4 08:59 ~/reframe-surf$ reframe -C settings_files/settings.py -c production_tests --mode=production --system snellius:gpu_a100 -r -n 'Blender_CompileShaders' -n 'Blender_RIOW'
[ReFrame Setup]
version: 4.6.1
command: '/sw/arch/RHEL8/EB_production/2023/software/ReFrame/4.6.1/bin/reframe -C settings_files/settings.py -c production_tests --mode=production --system snellius:gpu_a100 -r -n Blender_CompileShaders -n Blender_RIOW'
launched by: paulm@int4
working directory: '/gpfs/home4/paulm/reframe-surf'
settings files: '<builtin>', 'settings_files/settings.py'
check search path: (R) '/gpfs/home4/paulm/reframe-surf/production_tests'
stage directory: '/scratch-shared/paulm/reframe_output/staging/2024-07-16_08-59-27'
output directory: '/home/paulm/.reframe/production/output/2024-07-16_08-59-27'
log files: '/gpfs/home4/paulm/reframe-surf/reframe.log', '/gpfs/home4/paulm/reframe-surf/reframe.out'
('gpu_a100', 'eb-foss') ('gpu_a100', 'eb-foss') True
('gpu_a100', 'eb-foss') ('gpu_a100', 'eb-foss') True
[==========] Running 2 check(s)
[==========] Started on Tue Jul 16 08:59:42 2024+0200
[----------] start processing checks
[ RUN ] Blender_CompileShaders /ed1c9d95 @snellius:gpu_a100+eb-foss
[ OK ] (1/2) Blender_CompileShaders /ed1c9d95 @snellius:gpu_a100+eb-foss
P: kernel_loading: 0.45999999999999996 s (r:0, l:None, u:None)
[ RUN ] Blender_RIOW /214f6d42 @snellius:gpu_a100+eb-foss
[ OK ] (2/2) Blender_RIOW /214f6d42 @snellius:gpu_a100+eb-foss
P: render: 6.28 s (r:0, l:None, u:None)
P: max_error: 0.00784314 unitless (r:0, l:None, u:None)
[----------] all spawned checks have finished
[ PASSED ] Ran 2/2 test case(s) from 2 check(s) (0 failure(s), 0 skipped, 0 aborted)
[==========] Finished on Tue Jul 16 09:01:00 2024+0200
===============================================================================================================================================================================
PERFORMANCE REPORT
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[Blender_CompileShaders /ed1c9d95 @snellius:gpu_a100:eb-foss]
num_tasks_per_node: 1
num_gpus_per_node: 4
num_cpus_per_task: 72
num_tasks: 1
performance:
- kernel_loading: 0.45999999999999996 s (r: 0 s l: -inf% u: +inf%)
[Blender_RIOW /214f6d42 @snellius:gpu_a100:eb-foss]
num_tasks_per_node: 1
num_gpus_per_node: 4
num_cpus_per_task: 72
num_tasks: 1
performance:
- render: 6.28 s (r: 0 s l: -inf% u: +inf%)
- max_error: 0.00784314 unitless (r: 0 unitless l: -inf% u: +inf%)
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Log file(s) saved in '/gpfs/home4/paulm/reframe-surf/reframe.log', '/gpfs/home4/paulm/reframe-surf/reframe.out'
But when I run it on a non-GPU node I get warnings related to dependency resolution, dep_gpu_only()
never gets called, and two tests are (incorrectly) skipped:
snellius paulm@int4 09:01 ~/reframe-surf$ reframe -C settings_files/settings.py -c production_tests --mode=production --system snellius:genoa -r -n 'Blender_CompileShaders' -n 'Blender_RIOW'
[ReFrame Setup]
version: 4.6.1
command: '/sw/arch/RHEL8/EB_production/2023/software/ReFrame/4.6.1/bin/reframe -C settings_files/settings.py -c production_tests --mode=production --system snellius:genoa -r -n Blender_CompileShaders -n Blender_RIOW'
launched by: paulm@int4
working directory: '/gpfs/home4/paulm/reframe-surf'
settings files: '<builtin>', 'settings_files/settings.py'
check search path: (R) '/gpfs/home4/paulm/reframe-surf/production_tests'
stage directory: '/scratch-shared/paulm/reframe_output/staging/2024-07-16_09-02-09'
output directory: '/home/paulm/.reframe/production/output/2024-07-16_09-02-09'
log files: '/gpfs/home4/paulm/reframe-surf/reframe.log', '/gpfs/home4/paulm/reframe-surf/reframe.out'
WARNING: could not resolve dependency: ('Blender_RIOW', 'snellius:genoa', 'eb-foss') -> 'Blender_CompileShaders'
WARNING: could not resolve dependency: ('Blender_HoleInTheRoof', 'snellius:genoa', 'eb-foss') -> 'Blender_CompileShaders'
WARNING: skipping all dependent test cases
- ('Blender_RIOW', 'snellius:genoa', 'eb-foss')
- ('Blender_HoleInTheRoof', 'snellius:genoa', 'eb-foss')
[==========] Running 0 check(s)
[==========] Started on Tue Jul 16 09:02:27 2024+0200
[----------] start processing checks
[----------] all spawned checks have finished
[ PASSED ] Ran 0/0 test case(s) from 0 check(s) (0 failure(s), 0 skipped, 0 aborted)
[==========] Finished on Tue Jul 16 09:02:27 2024+0200
Log file(s) saved in '/gpfs/home4/paulm/reframe-surf/reframe.log', '/gpfs/home4/paulm/reframe-surf/reframe.out'
Now I can understand that Blender_CompileShaders
gets filtered out due to its valid_systems
not including the system I'm running the test on. But why would this cause the self.depends_on()
in Blender_RIOW
to not call dep_gpu_only()
at all? Shouldn't it evaluate that function first, and only when the dependency is needed check if it can be found?
Also interesting to see it list the 2nd test case Blender_HoleInTheRoof
in the output, which is indeed defined, but I don't ask for it with -n
on the command-line.
This is with Reframe 4.6.1
Edit: some wording