Weikai Tang
Weikai Tang
2_autograd_tutorial.ipynb 文件里,梯度部分 原文:反向传播 因为 out是一个纯量(scalar),out.backward() 等于out.backward(torch.tensor(1))。 运行时out.backward(torch.tensor(1))报错:RuntimeError: Expected is Floating Type (grads[i].type().scalarType()) to be true, but got false. 同时,scalar的翻译个人认为最好是标量。 改为:反向传播 因为 out是一个标量(scalar),out.backward() 等于out.backward(torch.tensor(1.))。
书中 10.4.3.2. 算法优化章节,提到“在前述5.3.3小节中,已经详细介绍了矩阵乘GEMM运算的优化”。请问该部分内容在哪里,谢谢! 
Hi Yuke,I read your paper and code, of course excellent work and thanks for the repository. I have question about the cuda kernel. I can see you computing GCN forward...
In my opinion, when loading data from global memory to shared memory(i.e. write shared memory) with vectorized access, because of the transposition, threads within a warp may write the same...