PSyclone icon indicating copy to clipboard operation
PSyclone copied to clipboard

Bugs when transforming mixed-precision kernels

Open arporter opened this issue 1 year ago • 6 comments

As found here: https://github.com/stfc/PSyclone/issues/2663#issuecomment-2332037300

The problem seems to be that we rename the interface rather than the actual subroutine that is being called.

arporter avatar Sep 17 '24 15:09 arporter

Fixing the interface renaming was relatively simple and I've now done that. Unfortunately, while testing I discovered that our 'clever' code that attempts to return the correct Routine PSyIR by looking at the signature takes no account of precision! This is because it uses LFRicTypes and it seems that that class doesn't account for precision. I can't remember what the thinking was here - I need to search through the Issues and see what I can find.

arporter avatar Sep 18 '24 15:09 arporter

Sergi suggested trying to ModuleInline the kernel first and then apply ACCRoutineTrans to it. However, that doesn't help because KernelModuleInlineTrans uses the same broken code to get the PSyIR of the routine to inline.

arporter avatar Sep 23 '24 10:09 arporter

In fact, calling ACCRoutineTrans on the inlined kernel also fails because:

 Transformation Error: routine 'matrix_vector_code_r_double' calls another routine 'MATMUL(matrix(:,:,ik), x_e)' which is not available on the accelerator device and therefore cannot have ACCRoutineTrans applied to it (TODO #342).

I thought we had MATMUL tagged as being available on the GPU?

EDIT: it's not but perhaps it could be?

arporter avatar Sep 23 '24 10:09 arporter

I'm making good progress with the plumbing but am now hitting errors because of the plain Symbols that are created in a kernel's symbol table for all of the reserved names. I need to work out why that isn't a problem on master.

arporter avatar Oct 02 '24 16:10 arporter

While working on this, I found this test (in lfric_kern_test.py):

def test_get_kernel_schedule_mixed_precision():
    '''
    Test that we can get the correct schedule for a mixed-precision kernel.

    '''
    api_config = Config.get().api_conf(TEST_API)
    _, invoke = get_invoke("26.8_mixed_precision_args.f90", TEST_API,
                           name="invoke_0", dist_mem=False)
    sched = invoke.schedule
    kernels = sched.walk(LFRicKern, stop_type=LFRicKern)
    # 26.8 contains an invoke of five kernels, one each at the following
    # precisions.
    kernel_precisions = ["r_def", "r_solver", "r_tran", "r_bl", "r_phys"]
    # Get the precision (in bytes) for each of these.
    precisions = [api_config.precision_map[name] for
                  name in kernel_precisions]
    # Check that the correct kernel implementation is obtained for each
    # one in the invoke.
    for precision, kern in zip(precisions, kernels):
        sched = kern.get_kernel_schedule()
        assert isinstance(sched, KernelSchedule)
        assert sched.name == f"mixed_code_{8*precision}"

so the precision-matching clearly does work in some situations. It may be that it simply has a bug for scalar arguments?

arporter avatar Oct 03 '24 10:10 arporter

For the record, the code that we used to use to attempt to resolve the correct routine implementation was:

       # The kernel name corresponds to an interface block. Find which
       # of the routines matches the precision of the arguments.
       for routine in routines:
           try:
               # The validity check for the kernel arguments should raise
               # an exception if the precisions don't match but currently
               # this fails for scalar arguments.
               self.validate_kernel_code_args(routine.symbol_table)
               sched = routine
               break
            except GenerationError:
               pass
        else:
            raise GenerationError(
                f"Failed to find a kernel implementation with an interface"
                f" that matches the invoke of '{self.name}'. (Tried "
                f"routines {[item.name for item in routines]}.)")

I'm putting this here as I'm going to remove from the code base (LFRicKern.get_kernel_schedule()) it for the moment.

arporter avatar Oct 09 '24 09:10 arporter