pykan icon indicating copy to clipboard operation
pykan copied to clipboard

Error AssertionError: Torch not compiled with CUDA enabled with PC only

Open yotitinogs opened this issue 9 months ago • 8 comments


AssertionError Traceback (most recent call last) Cell In[26], line 10 7 device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') 8 print(device) ---> 10 model = KAN(width=[15,6,3,1], grid=5, k=3,device=device) #, grid_range=(0,1)) #, seed=0) # noise_scale_base = 0., base_fun = lambda x: x, noise_scale = 0) 12 # model = KAN(width=[12,6,3,1], grid=GBest, k=kBest) #, seed=0) # noise_scale_base = 0., base_fun = lambda x: x, noise_scale = 0)

File c:\Users\thnog\AppData\Local\Programs\Python\Python311\Lib\site-packages\kan\KAN.py:140, in KAN.init(self, width, grid, k, noise_scale, noise_scale_base, base_fun, symbolic_enabled, bias_trainable, grid_eps, grid_range, sp_trainable, sb_trainable, device, seed) 137 for l in range(self.depth): 138 # splines 139 scale_base = 1 / np.sqrt(width[l]) + (torch.randn(width[l] * width[l + 1], ) * 2 - 1) * noise_scale_base --> 140 sp_batch = KANLayer(in_dim=width[l], out_dim=width[l + 1], num=grid, k=k, noise_scale=noise_scale, scale_base=scale_base, scale_sp=1., base_fun=base_fun, grid_eps=grid_eps, grid_range=grid_range, sp_trainable=sp_trainable, 141 sb_trainable=sb_trainable, device=device) 142 self.act_fun.append(sp_batch) 144 # bias

File c:\Users\thnog\AppData\Local\Programs\Python\Python311\Lib\site-packages\kan\KANLayer.py:126, in KANLayer.init(self, in_dim, out_dim, num, k, noise_scale, scale_base, scale_sp, base_fun, grid_eps, grid_range, sp_trainable, sb_trainable, device) 124 self.scale_base = torch.nn.Parameter(torch.ones(size, device=device) * scale_base).requires_grad_(sb_trainable) # make scale trainable 125 else: --> 126 self.scale_base = torch.nn.Parameter(torch.FloatTensor(scale_base).cuda()).requires_grad_(sb_trainable) 127 self.scale_sp = torch.nn.Parameter(torch.ones(size, device=device) * scale_sp).requires_grad_(sp_trainable) # make scale trainable 128 self.base_fun = base_fun

File c:\Users\thnog\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\cuda_init_.py:284, in _lazy_init() ... 286 raise AssertionError( 287 "libcudart functions unavailable. It looks like you have a broken build?" 288 )

AssertionError: Torch not compiled with CUDA enabled

yotitinogs avatar May 08 '24 01:05 yotitinogs

If you have a GPU which supports CUDA, torch.device('cuda' if torch.cuda.is_available() else 'cpu') will return 'cuda'. That said, if you didn't install pytorch with +cu121 or similar, you'll get the error. Please follow pytorch official documentation.

AlessandroFlati avatar May 08 '24 04:05 AlessandroFlati

It's a minor issue on KANLayer.py

        if isinstance(scale_base, float):
            self.scale_base = torch.nn.Parameter(torch.ones(size, device=device) * scale_base).requires_grad_(sb_trainable)  # make scale trainable
        else:
            self.scale_base = torch.nn.Parameter(torch.FloatTensor(scale_base).cuda()).requires_grad_(sb_trainable)

Just remove the .cuda() from the last line. Or add another conditional that maps it to cuda iff device == 'cuda'. (This seems to have been fixed already in master so just git pull again)

fermisea avatar May 08 '24 09:05 fermisea

It seems that you're using pykan v0.0.3, and the issue you encountered has been fixed in PR #98. However, it hasn't been released yet.

@KindXiaoming, could you please make a new release on the master branch to incorporate the fix and resolve the issue? Thanks in advance!

Jim137 avatar May 08 '24 10:05 Jim137

Hi @Jim137, I think I merged PR #98 yesterday, and pykan v0.0.3 is released after the merge. Please try if it works now, thank you!

KindXiaoming avatar May 08 '24 14:05 KindXiaoming

Hi @KindXiaoming, Apologies for the confusion. It seems that pykan v0.0.3 doesn't contain PR #98. The last commit in v0.0.3 is 116f399, which predates commit 70b7b8d where the fix was implemented.

Jim137 avatar May 08 '24 15:05 Jim137

Thanks @Jim137 , is now good?

KindXiaoming avatar May 09 '24 00:05 KindXiaoming

@KindXiaoming, sorry, I mean for users downloading via pypi. The error still occurs.

Here is my test using pypi version:

❯ /home/jim137/git/kan/test/bin/python /home/jim137/git/kan/test1.py
description:   0%|                                                           | 0/20 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/jim137/git/kan/test1.py", line 23, in <module>
    model.train(dataset, opt="LBFGS", steps=20)
  File "/home/jim137/git/kan/test/lib/python3.11/site-packages/kan/KAN.py", line 898, in train
    self.update_grid_from_samples(dataset['train_input'][train_id].to(device))
  File "/home/jim137/git/kan/test/lib/python3.11/site-packages/kan/KAN.py", line 243, in update_grid_from_samples
    self.forward(x)
  File "/home/jim137/git/kan/test/lib/python3.11/site-packages/kan/KAN.py", line 311, in forward
    x_numerical, preacts, postacts_numerical, postspline = self.act_fun[l](x)
                                                           ^^^^^^^^^^^^^^^^^^
  File "/home/jim137/git/kan/test/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jim137/git/kan/test/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jim137/git/kan/test/lib/python3.11/site-packages/kan/KANLayer.py", line 176, in forward
    y = self.scale_base.unsqueeze(dim=0) * base + self.scale_sp.unsqueeze(dim=0) * y
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

It seems that #98 does not be included in v0.0.3.

Jim137 avatar May 09 '24 03:05 Jim137

Thank you, gotcha! Have released 0.0.4 including new changes.

KindXiaoming avatar May 09 '24 10:05 KindXiaoming