Triton-Puzzles
Triton-Puzzles copied to clipboard
question about long softmax
I solve the long softmax puzzels, but I have to store the intermediate results to z_ptr, which may cause unnecessary Memory I/O.
Essentially, I would like to know if there's a solution to create temporary array in shared memory and store intermediate results there in Triton?
There is 😃