About the make_tensor function
`template <class ProblemShape, class CtaTiler,
class TA, class AStride,
class TB, class BStride,
class TC, class CStride>
global static
void
mak_tensor(ProblemShape shape_MNK, CtaTiler cta_tiler,
TA const* A, AStride dA,
TB const* B, BStride dB,
TC const* C, CStride dC
)
{
using namespace cute;
// Preconditions
// // Full and Tiled Tensors //
// Represent the full tensors Tensor mA = make_tensor(make_gmem_ptr(A), select<0,2>(shape_MNK), dA); // (M,K) Tensor mB = make_tensor(make_gmem_ptr(B), select<1,2>(shape_MNK), dB); // (N,K) Tensor mC = make_tensor(make_gmem_ptr(C), select<0,1>(shape_MNK), dC); // (M,N)
// Get the appropriate blocks for this thread block auto cta_coord = make_coord(blockIdx.x, blockIdx.y, _); // (m,n,k) Tensor gA = local_tile(mA, cta_tiler, cta_coord, Step<_1, X,_1>{}); // (BLK_M,BLK_K,k) Tensor gB = local_tile(mB, cta_tiler, cta_coord, Step< X,_1,_1>{}); // (BLK_N,BLK_K,k) Tensor gC = local_tile(mC, cta_tiler, cta_coord, Step<_1,_1, X>{}); // (BLK_M,BLK_N) #if 1 if(thread0()) { print(" mB : "); print( mB); print("\n"); // print(" gB : "); print( gB); print("\n"); // print(" sB : "); print( (sB)); print("\n"); // print("tBgB : "); print(tBgB); print("\n"); // print("tBsB : "); print(tBsB); print("\n"); // print("tArA : "); print(tArA); print("\n"); } #endif // (BLK_N,BLK_K) }`
cudaErrorLaunchFailure: unspecified launch failure
I don't know why make_tensor call inside kernel function must provide ld step parameter, otherwise it will report error, but call outside kernel function can not provide ld parameter, I don't know why, hope you can give me an answer, thank you
I don't quite understand what's your issue here from the description above. From my observation, there is only one type of make_tensor call here:
Tensor mA = make_tensor(make_gmem_ptr(A), select<0,2>(shape_MNK), dA); // (M,K)
Tensor mB = make_tensor(make_gmem_ptr(B), select<1,2>(shape_MNK), dB); // (N,K)
Tensor mC = make_tensor(make_gmem_ptr(C), select<0,1>(shape_MNK), dC); // (M,N)
The other one is just your global function name. Could you state your issue more specifically?
What I'm trying to say is that make_tensor creates a tensor of the same shape, sometimes it works, sometimes it doesn't, and here I'm using make_tensor inside a CUDA kernel where I have to provide ld arguments, otherwise I'll get an error, If I call make_tensor from inside the kernel instead of inside the main function, I can still succeed without providing ld arguments, I don't know why.
sometimes it works, sometimes it doesn't
What does this mean?
This launch error likely has nothing to do with make tensor. Please provide your full repro steps starting at cd-ing into the build directory
This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.
This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.