YUKE WANG comments

Results 19 comments of


                                            YUKE WANG

embedding_dim/BLK_H

I suggest you try both, where shared memory is more runtime efficient but requires more kernel re-implementation.

Cuda error

Hi, @publiccoderepo Thanks for reaching out, may I know the GPU you use and the CUDA/NVCC version?

This seems to be the problem of SM architecture of GPU when compilation, you can try change the command in https://github.com/YukeWang96/PPoPP22_QGTC/tree/master#install-qgtc-go-to-qgtc_module-then-run ``` TORCH_CUDA_ARCH_LIST="7.5" python setup.py clean --all install ``` where...

Cuda error

@publiccoderepo After checking the CUDA document, I find that `bmmaBitOpAND` was not introduced until the Ampere GPU (sm>=80), Sorry about that. Here is the reference https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html?highlight=bmmaBitOpAND#sub-byte-operations:~:text=bmmaBitOpAND%20%3D%202%20%20//%20compute_80%20minimum

Any docs about this project?

Hi, Thanks for your interest in our work! The paper for this project is currently under submission. And yes, we will release this kind of information later on. Please check...

spmm kernel

Please follow the instructions in this subsection. https://github.com/YukeWang96/TC-GNN_ATC23#8-running-tc-gnn-in-single-kernel-comparison

spmm kernel

Hi, Single-kernel profiling uses the `dataset.dim` as the output feature size (N), where input and output has the same dimension. https://github.com/YukeWang96/TC-GNN_ATC23/blob/f50ffc491fb07bc78b7af8c0ffdec2b0bd7ec1a2/main_tcgnn.py#L35 https://github.com/YukeWang96/TC-GNN_ATC23/blob/f50ffc491fb07bc78b7af8c0ffdec2b0bd7ec1a2/gnn_conv.py#L179

Cuda Graph optimization

Do these two SpMM functions correspond to the two-layer forward of the GCN model?

Improper use of CUDA Graph

Hi, Thanks for reaching out! Thanks for bringing this to our attention, our current observation is that the CUDA graph on PyToch seems to have some problem supporting kernel with...