loopy
loopy copied to clipboard
[codegen, bug]: Callee kernel name generation is incorrect
Kernel call sites assume that the name of the generated function is identical to the LoopKernel's name. However, the actual name of the non-entrypoint kernel is generated during linearization as
https://github.com/inducer/loopy/blob/f67b65ccb7b377430986cf59db0747e51dfe84e7/loopy/schedule/device_mapping.py#L35-L40
Two ways of fixing this:
- For non-entrypoint kernels we should always emit
CallKernelwith name equal to itsLoopKernel, OR, - While emitting the code for a
Callexpression node, query the translation unit to get itsCallKernel.
I feel (1) is cleaner(+easier to implement) and seems like it doesn't make any assumption that might bite us in the future. Opinions?
Yep, agree the the name should be decided once and then not messed with.