Bugs when transforming mixed-precision kernels
As found here: https://github.com/stfc/PSyclone/issues/2663#issuecomment-2332037300
The problem seems to be that we rename the interface rather than the actual subroutine that is being called.
Fixing the interface renaming was relatively simple and I've now done that. Unfortunately, while testing I discovered that our 'clever' code that attempts to return the correct Routine PSyIR by looking at the signature takes no account of precision! This is because it uses LFRicTypes and it seems that that class doesn't account for precision. I can't remember what the thinking was here - I need to search through the Issues and see what I can find.
Sergi suggested trying to ModuleInline the kernel first and then apply ACCRoutineTrans to it. However, that doesn't help because KernelModuleInlineTrans uses the same broken code to get the PSyIR of the routine to inline.
In fact, calling ACCRoutineTrans on the inlined kernel also fails because:
Transformation Error: routine 'matrix_vector_code_r_double' calls another routine 'MATMUL(matrix(:,:,ik), x_e)' which is not available on the accelerator device and therefore cannot have ACCRoutineTrans applied to it (TODO #342).
I thought we had MATMUL tagged as being available on the GPU?
EDIT: it's not but perhaps it could be?
I'm making good progress with the plumbing but am now hitting errors because of the plain Symbols that are created in a kernel's symbol table for all of the reserved names. I need to work out why that isn't a problem on master.
While working on this, I found this test (in lfric_kern_test.py):
def test_get_kernel_schedule_mixed_precision():
'''
Test that we can get the correct schedule for a mixed-precision kernel.
'''
api_config = Config.get().api_conf(TEST_API)
_, invoke = get_invoke("26.8_mixed_precision_args.f90", TEST_API,
name="invoke_0", dist_mem=False)
sched = invoke.schedule
kernels = sched.walk(LFRicKern, stop_type=LFRicKern)
# 26.8 contains an invoke of five kernels, one each at the following
# precisions.
kernel_precisions = ["r_def", "r_solver", "r_tran", "r_bl", "r_phys"]
# Get the precision (in bytes) for each of these.
precisions = [api_config.precision_map[name] for
name in kernel_precisions]
# Check that the correct kernel implementation is obtained for each
# one in the invoke.
for precision, kern in zip(precisions, kernels):
sched = kern.get_kernel_schedule()
assert isinstance(sched, KernelSchedule)
assert sched.name == f"mixed_code_{8*precision}"
so the precision-matching clearly does work in some situations. It may be that it simply has a bug for scalar arguments?
For the record, the code that we used to use to attempt to resolve the correct routine implementation was:
# The kernel name corresponds to an interface block. Find which
# of the routines matches the precision of the arguments.
for routine in routines:
try:
# The validity check for the kernel arguments should raise
# an exception if the precisions don't match but currently
# this fails for scalar arguments.
self.validate_kernel_code_args(routine.symbol_table)
sched = routine
break
except GenerationError:
pass
else:
raise GenerationError(
f"Failed to find a kernel implementation with an interface"
f" that matches the invoke of '{self.name}'. (Tried "
f"routines {[item.name for item in routines]}.)")
I'm putting this here as I'm going to remove from the code base (LFRicKern.get_kernel_schedule()) it for the moment.