libCEED
libCEED copied to clipboard
Steps to Develop kernel fusion for qfunctions implemented in Python
I will appreciate it if someone could advise on the following:
I am working to develop kernel fusion for qfunction implemented in Python and possibly in other languages. Environment: CUDA 12.2; Clang 19 CUDA C++ code compilation using clang to obtain a *.ptx file was successful. What is next? I used cuModuleLoad with the *.ptx file. What is next?
What are the remaining required steps, including environment configuration, that will lead to kernel fusion for qfunction? . Should the kernel fusion code be written in Cuda C++, Cuda Python, or other, and with the use of libCEED API? Note: Defining User Q-Functions is one of the main document I read regarding Q-functions. Are there additional document I may be referred to? Thanks.