pykan icon indicating copy to clipboard operation
pykan copied to clipboard

How to train own dataset for regression?

Open SuleymanSuleymanzade opened this issue 9 months ago • 2 comments

Hello, how to train own dataset for regression task? I created the dataset in this way to check the regression task.

dataset = {
    'train_input':torch.from_numpy(X_train[:3000]),
    'test_input': torch.from_numpy(X_test[:2000]),
    'train_label':torch.from_numpy(y_train[:3000]),
    'test_label':torch.from_numpy(y_test[:2000]),
}

but when I set model to train

model.train(dataset, opt="LBFGS", steps=20, lamb=0.01, lamb_entropy=10.);

it gave me an error:

`File /opt/conda/lib/python3.10/site-packages/kan/LBFGS.py:319, in LBFGS.step(self, closure) 316 state.setdefault('n_iter', 0) 318 # evaluate initial f(x) and df/dx --> 319 orig_loss = closure() 320 loss = float(orig_loss) 321 current_evals = 1

File /opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py:115, in context_decorator..decorate_context(*args, **kwargs) 112 @functools.wraps(func) 113 def decorate_context(*args, **kwargs): 114 with ctx_factory(): --> 115 return func(*args, **kwargs)

File /opt/conda/lib/python3.10/site-packages/kan/KAN.py:897, in KAN.train..closure() 895 train_loss = loss_fn(pred[id_], dataset['train_label'][train_id][id_].to(device)) 896 else: --> 897 train_loss = loss_fn(pred, dataset['train_label'][train_id].to(device)) 898 reg_ = reg(self.acts_scale) 899 objective = train_loss + lamb*reg_

IndexError: index 2941 is out of bounds for dimension 0 with size 2000`

SuleymanSuleymanzade avatar May 02 '24 11:05 SuleymanSuleymanzade

Hi, could you please check the shape of your inputs and labels? in particular, dataset['train_label'] should have shape [3000, x] but looks like it has shape [2000, x] somehow.

KindXiaoming avatar May 02 '24 13:05 KindXiaoming

@SuleymanSuleymanzade, it appears that you may have a data slicing issue when creating your dataset. Can you post the shapes of each of your dataset componets, like so?:

print("Train Input Shape:", dataset['train_input'].shape)
print("Train Label Shape:", dataset['train_label'].shape)
print("Test Input Shape:", dataset['test_input'].shape)
print("Test Label Shape:", dataset['test_label'].shape)

I haven't seen this error yet, but the fact that your training and test data appear to contain the same data ([:3000] and [:2000], respectively. I would suggest slicing it this way and see if it works for you:

#Assuming your original dataset is stored as is `df` with all features you need/want:
#Replace slicing with whatever range you want/need.

dataset = {}

train_input, train_label = np.array(df.drop('<target_var>', inplace=True)[:-1000]), np.array(df['<target_var>'][:-1000])
test_input, test_label = np.array(df.drop('<target_var>', inplace=True)[-1000:]), np.array(df['<target_var>'][-1000:])


dataset['train_input'] = torch.from_numpy(train_input)
dataset['train_label'] = torch.from_numpy(train_label.reshape(-1, 1))
dataset['test_input'] = torch.from_numpy(test_input)
dataset['test_label'] = torch.from_numpy(test_label.reshape(-1, 1))

matthewdillonsmith avatar May 03 '24 12:05 matthewdillonsmith