Yunsong Wang
Yunsong Wang
cc @sleeepyjack
@srinivasyadav18 can you please share the performance comparisons before and after this PR?
Update: waiting for #15700 to merge to determine if the current PR can improve performance after resolving the register issues.
/ok to test
@willtryagain Thank you for your contribution! The CI test failure is caused by an issue with CCCL (https://godbolt.org/z/94nYnvqMW), which may take some time to resolve. Could you please keep the...
/ok to test
> Hello @PointKernel , can you pleas tell if the check is failing because of problem in my code? or is it still `cuda::std::ceil`. Thank you for following up. It's...