TensorComprehensions
TensorComprehensions copied to clipboard
TC ops fail to end when calculation is finished
I am experimenting a simple TC op, the calculation finished quickly but the program then hangs and never exit. Both GPU and CPU are still in full utilization even after the calculation is finished. The example matmul
code works fine. I am wondering whether I am doing something wrong or there is a bug?
Below is the information -OS: Centos 7 -How you installed TC: conda -Python version: 3.6.5 -Conda version: 4.4.11
Here is code.
import tensor_comprehensions as tc
import torch
import timeit
lang = """
def pairsum(float(L, M, 3) A, float(L, M, 3) B) -> (out) {
out(l) +=! A(l, i, 0) - A(l, j, 0) +
A(l, i, 1) - A(l, j, 1) +
A(l, i, 2) - A(l, j, 2) +
B(l, i, 0) - B(l, j, 0) +
B(l, i, 1) - B(l, j, 1) +
B(l, i, 2) - B(l, j, 2)
}
"""
pairsum = tc.define(lang, name="pairsum")
mat1, mat2 = torch.randn(32, 1536, 3).cuda(), torch.randn(32, 1536, 3).cuda()
def test():
out= pairsum(mat1, mat2)
print(timeit.timeit(test, number=1000))
print("test finished")
thanks @Junonia for the report. my guess is that the gpu might be held because of the kernel being very slow or the compilation being stuck.
tentatively passing to @ftynse to see if he has ideas or who might have ideas into what's going on here. please feel free to assign to me afterwards. :) thanks
Hmmm, I've seen python from dead pytorch sessions showing up in nvidia-smi
even without TC.
@prigoyal does the call to a TC function block and wait for cuda kernel to complete? The only guess I can have is that something returns early and the kernel keeps running. In this code, out
is never read so there is no guarantee that the computation indeed terminated.