YUKE WANG
YUKE WANG
I suggest you try both, where shared memory is more runtime efficient but requires more kernel re-implementation.
Hi, @publiccoderepo Thanks for reaching out, may I know the GPU you use and the CUDA/NVCC version?
This seems to be the problem of SM architecture of GPU when compilation, you can try change the command in https://github.com/YukeWang96/PPoPP22_QGTC/tree/master#install-qgtc-go-to-qgtc_module-then-run ``` TORCH_CUDA_ARCH_LIST="7.5" python setup.py clean --all install ``` where...
@publiccoderepo After checking the CUDA document, I find that `bmmaBitOpAND` was not introduced until the Ampere GPU (sm>=80), Sorry about that. Here is the reference https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html?highlight=bmmaBitOpAND#sub-byte-operations:~:text=bmmaBitOpAND%20%3D%202%20%20//%20compute_80%20minimum
Hi, Thanks for your interest in our work! The paper for this project is currently under submission. And yes, we will release this kind of information later on. Please check...
Please follow the instructions in this subsection. https://github.com/YukeWang96/TC-GNN_ATC23#8-running-tc-gnn-in-single-kernel-comparison
Hi, Single-kernel profiling uses the `dataset.dim` as the output feature size (N), where input and output has the same dimension. https://github.com/YukeWang96/TC-GNN_ATC23/blob/f50ffc491fb07bc78b7af8c0ffdec2b0bd7ec1a2/main_tcgnn.py#L35 https://github.com/YukeWang96/TC-GNN_ATC23/blob/f50ffc491fb07bc78b7af8c0ffdec2b0bd7ec1a2/gnn_conv.py#L179
Do these two SpMM functions correspond to the two-layer forward of the GCN model?
Hi, Thanks for reaching out! Thanks for bringing this to our attention, our current observation is that the CUDA graph on PyToch seems to have some problem supporting kernel with...