ziyuhuang123
ziyuhuang123
Hi! Seems TACC is a platform providing free GPU resources...? Any limitations? Such as the disk size or GPU running time? Thanks!
Your idea is very excellent and I have starred your repo. I want to check my understanding's correctness: This paper does not modify the kernel implementation but instead considers that...
Hi! I am a researcher on GPU, could you provide GPU code? Thanks!
Hi! I am running on 4090 for example/attn/4090, using nvcc=12.3, gcc and g++=10, but meet error below: ``` (py_hzy_new) 4090-01% make nvcc -ccbin=/home/zyhuang/miniconda3/envs/py_hzy_new/bin/g++ -DNDEBUG -Xcompiler=-fPIE --expt-extended-lambda --expt-relaxed-constexpr -Xcompiler=-Wno-psabi -Xcompiler=-fno-strict-aliasing --use_fast_math...
``` if(blockIdx.x==0&&blockIdx.y==1&&threadIdx.x==0&&threadIdx.y==0){ printf("enter tail-365\n"); } // ----------> This is in one file if(blockIdx.x==0&&blockIdx.y==1&&threadIdx.x==383&&threadIdx.y==0){ print(tCrA); printf(" after --- tCrA\n"); printf("colle 932\n"); } // ----------> This is in another file ``` As...
**What is your question?** https://github.com/NVIDIA/cutlass/blob/f7b19de32c5d1f3cedfc735c2849f12b537522ee/include/cutlass/gemm/collective/sm90_mma_tma_gmma_ss_warpspecialized.hpp#L477-L554 I understand that parts 2 and 3 correspond to k_iter's 0 and [1, k_end), respectively. However, what is the purpose of part 1? Why does...
**Describe the bug** In PTX, I noticed that ``` cp.async.bulk.dst.src.completion_mechanism [dstMem], [srcMem], size, [mbar] .dst = { .shared::cluster } .src = { .shared::cta } .completion_mechanism = { .mbarrier::complete_tx::bytes } ```...
**What is your question?** I am writing a class. I want to create a private tensor for later usage. But I do not know how to create an empty tensor....
**What is your question?** In test/unit/pipeline/pipeline_tma_async_warp_specialized.cu, I see: pipeline.producer_commit And in cutlass/include/cutlass/gemm/collective/sm90_mma_tma_gmma_ss_warpspecialized.hpp, I see: pipeline.producer_get_barrier They locates at almost same place, so I guess they have same function. Why? Any...
**What is your question?** https://github.com/NVIDIA/cutlass/blob/f7b19de32c5d1f3cedfc735c2849f12b537522ee/include/cutlass/detail/layout.hpp#L111-L117 data:image/s3,"s3://crabby-images/59bb6/59bb66db52d5a71e722171f364452c3cdd59a7e5" alt="b6ae8dc3b4fe301f456ee7c9f28b782" Both rowMajor, for A, it is (number, 1); for B, it is (1, number).