pykan
pykan copied to clipboard
This will solve CPU-only, CUDA-only and any mix of them.
This solves the post-fix_symbolic
problem with cuda, the initialize_from_another_model
problem with cuda, and the cpu
problem related (already mentioned in this PR) that forced to use cuda.
@KindXiaoming This should close many issues related to using CUDA. To work properly, I recommend updating requirements.txt to the following
matplotlib==3.6.2
numpy==1.26.4
scikit-learn==1.4.2
setuptools==69.5.1
sympy==1.11.1
torch==2.2.2
tqdm==4.66.2
Please let me know if you want me to make another PR or you'll handle this by yourself.
There's another device missing in https://github.com/KindXiaoming/pykan/blob/master/kan/KAN.py#L205. I've addressed it in my fork at https://github.com/Jim137/pykan/tree/develop. Would you be open to merging my changes and submitting a pull request together?
Good point, I added it.
I don't know why but if use MPS(Apple SIlicon) to loss is nan.
model.train(dataset, opt="LBFGS", steps=20, lamb=0.01, lamb_entropy=10., device=device.type);
train loss: nan | test loss: nan | reg: nan : 100%|█████████████████| 20/20 [00:03<00:00, 5.11it/s]
@brainer3220 I'm afraid I can't help too much with MPS, but it seems nonetheless a common issue between MPS and Torch (see https://github.com/pytorch/pytorch/issues/112834, for example).
I am trying to run the given example of KAN in colab with @AlessandroFlati AlessandroFlati:develop implementation:
Still getting the above error. I used the following requirements: matplotlib==3.6.2 numpy==1.26.4 scikit-learn==1.4.2 setuptools==69.5.1 sympy==1.11.1 torch==2.2.1 tqdm==4.66.2
In case I want to run on cpu, it says no NVIDIA drivers selected.
Any help to resolve this is appreciated. Thanks!
I am trying to run the given example of KAN in colab with @AlessandroFlati AlessandroFlati:develop implementation:
Still getting the above error. I used the following requirements: matplotlib==3.6.2 numpy==1.26.4 scikit-learn==1.4.2 setuptools==69.5.1 sympy==1.11.1 torch==2.2.1 tqdm==4.66.2
In case I want to run on cpu, it says no NVIDIA drivers selected.
Any help to resolve this is appreciated. Thanks!
First you need to initialize a torch.device like this device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
Then use device in all constructors device = device
Finally, you will need to put the dataset tensor on device doing this:
dataset['train_input'] = dataset['train_input'].to(device)
dataset['train_label'] = dataset['train_label'].to(device)
Thanks, now I am able to run on colab GPU. But the CPU problem persists.
Thanks, now I am able to run on colab GPU. But the CPU problem persists.
This pull request solves it, you can try to modify pykan like in those commits: https://github.com/KindXiaoming/pykan/pull/98/commits/d606bd88bd76f867ef1e2e0780d68fb4f378ce65 https://github.com/KindXiaoming/pykan/pull/98/commits/c857dd65b737ce5f1845555416ddef8ba7865ff8
I had the same problem #75.
Hi @AlessandroFlati, would appreciate you make another PR for me! Thanks in advance :)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") device(type='cuda')
print(torch.cuda.is_available()) True
model.to(device)
dataset['train_input'] = dataset['train_input'].to(device) dataset['train_label'] = dataset['train_label'].to(device)
but there is still a problem
--> 170 x = torch.einsum('ij,k->ikj', x, torch.ones(self.out_dim, device=self.device)).reshape(batch, self.size).permute(1, 0) 171 preacts = x.permute(1, 0).clone().reshape(batch, self.out_dim, self.in_dim) 172 base = self.base_fun(x).permute(1, 0) # shape (batch, size)
File E:\anaconda\envs\4torch2\lib\site-packages\torch\functional.py:380, in einsum(*args) 375 return einsum(equation, *_operands) 377 if len(operands) <= 2 or not opt_einsum.enabled: 378 # the path for contracting 0 or 1 time(s) is already optimized 379 # or the user has disabled using opt_einsum --> 380 return _VF.einsum(equation, operands) # type: ignore[attr-defined] 382 path = None 383 if opt_einsum.is_available():
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
You shouldn't just model.to(device)
, but rather create both model and dataset passing device=device
argument. Besides, you're missing test_input
and test_label
keys for dataset