pykan
pykan copied to clipboard
KAN takes significant time to infer by CUDA?
I tried to set up a KAN model and found that KAN takes a lot of time to infer the output with CUDA. Here is my test code:
from kan import KAN
import torch
import time
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# without cuda
start = time.time()
model = KAN(width=[768,64,2], grid=5, k=3)
x = torch.normal(0,0.5,size=(4,768))
y = model(x)
end = time.time()
print(end - start) # 3.04 s
# with cuda
start = time.time()
model = KAN(width=[768,64,2], grid=5, k=3, device = device)
x = torch.normal(0,0.5,size=(4,768)).to(device)
y = model(x)
end = time.time()
print(end - start) # 10.9s
What happens? Do you know if I miss something here?
Simply enough, imho, the model isn't large enough and the x
passed to the model isn't large enough to appreciate the parallelism as opposed to the overhead of passing that data to CUDA.
Just fixed a bunch of issues related to cuda and seems cuda runs much faster (20x speed up) than cpu for a [4,100,100,100,1] KAN: https://github.com/KindXiaoming/pykan/blob/master/tutorials/API_10_device.ipynb