plant310

Results 2 issues of plant310

hi,I read the spmm kernel and find that the number of tiles to process for the feature dimension is calculated by embedding_dim/BLK_H, which may not be applicable for the last...

hi,I used nsight system to view the timeline after using cuda graph and found that the spmm kernels in the forward and backward passes were clustered together, which seems to...