cutlass
cutlass copied to clipboard
[QST] How to make Main loop fusion for floats
What is your question?
The mainloop fusion examples provided in 25_ampere_fprop_mainloop_fusion and 26_ampere_wgrad_mainloop_fusion use half-precision (float16). I want to adapt these examples to work with single-precision (float32). I changed the element types and tile shapes to support floats, but the examples are failing. To understand why, I examined the scale_bias_relu_transform.h file. I believe changes are needed there. Could anyone guide me on how to achieve correctness with floats? Additionally, the activation function used is ReLU. Is it possible to implement the LeakyReLU activation function in the mainloop fusion? If so, how can this be done?