TC-GNN_ATC23
TC-GNN_ATC23 copied to clipboard
Cuda Graph optimization
hi,I used nsight system to view the timeline after using cuda graph and found that the spmm kernels in the forward and backward passes were clustered together, which seems to break the logic of the program. Is there any solution for this?
Do these two SpMM functions correspond to the two-layer forward of the GCN model?
The dependency in combination and aggregation operation seems to be broken. And I compare the test accuracy with and without the cuda graph optimization, it looks like that cuda graph optimization makes the test accuracy drop to a very low level