quda
quda copied to clipboard
Use kernel arg
We first make sure when BlasArg is instantiated, the argument for functor caxpyxmazMR_ is truly passed as a kernel argument. Then we make use_kernel_arg an int, and introduce it to BlasFunctor and caxpyxmazMR_, and let device::use_kernel_arg know that we really want use_kernel_arg if its value is more than 1.
Jenkins: Can one of the admins verify this patch?
What's the motivation for this?
This ensures that BlasArg for caxpyxmazMR_ is private to each thread, even if device::max_kernel_arg_size() is 0. It provides backend implementations freedom to choose different ways to pass kernel arguments without accidentally break the code.
I've changed int to an enum, and since it is a custom type, we need a few extra member functions in kernel_param so that we can keep target_device.h independent of kernel_helper.h.
I also fixed HeavyQuarkResidualNorm_ and xpyHeavyQuarkResidualNorm_, which received the same treatment as caxpyxmazMR_ now.
I think we should make the enum a enum class which is more error proofing. Also personally I am fine with the it but I remember the enum's in QUDA needs to all be upper case?
Yes, for consistency with QUDA, I would strongly prefer all caps for the enum. Also, instead of yes / no / always, I would prefer TRUE / FALSE / ALWAYS
Apologies if I was unclear @jxy. What I meant for the enum names was to use USE_KERNEL_ARG_FALSE, USE_KERNEL_ARG_TRUE, USE_KERNEL_ARG_ALWAYS.
This is enum class, now. The names are properly scoped. You actually can't directly use FALSE/TRUE/ALWAYS as it is without the enum class name and double colon, unless we move to c++20 and use using enum declaration.
That's a good point, that it's scoped so the prefix is unnecessary.
Latest discussion on portability call is that we should be able to delete the static_asserts in blas_core.cuh and reduce_core.cuh which should be redundant given the static_asserts in the target launcher.
@jcosborn to review this PR.
Removed static_assert as we discussed. I left comments in the constructors to remind ourselves in the future.
test this please
@Jenkins test this please