Soft-Decision-Tree
Soft-Decision-Tree copied to clipboard
Is it suitable for regression prediction?
Hello, I'd like to ask if I want to make regression prediction and output_ Dim = = 1, is SDT applicable (it seems to be only used for classification model?)
Thanks!
Hi @775269512, I think it is intuitive to use SDT on regression tasks, simply change the training criterion in main.py
, and there should be no need to modify anything inside the implementation of SDT.
hi, I did a simple experiment. Although the overall loss is decreasing, the output of each sample is the same and cannot be regressed (here out_dim = 1),
x = tensor([[ 1., 1., 1., 1., 1.], [ 2., 2., 2., 2., 2.], [ 3., 3., 3., 3., 3.], [ 4., 4., 4., 4., 4.], [ 5., 5., 5., 5., 5.], [ 6., 6., 6., 6., 6.], [ 7., 7., 7., 7., 7.], [ 8., 8., 8., 8., 8.], [ 9., 9., 9., 9., 9.], [10., 10., 10., 10., 10.]]) and y = np.array([5.56, 5.70, 5.91, 6.40, 6.80, 7.05, 8.90, 8.70, 9.00, 9.05]).ravel()
i got this result tensor([[7.1672], [7.3185], [7.3203], [7.3204], [7.3204], [7.3204], [7.3204], [7.3204], [7.3204], [7.3204]], grad_fn=<MmBackward>) Epoch: 499 | Loss: 1.93230 | Correct: 000/128
I don't know how to change it. It seems that when outdim = 1, the value of each leaf node is the same.
QAQ
Could you show me the code snippet on training and evaluating?
yep, it's here. In addition, I found that a paper is based on SDT, and I will study it: "SDTR: Soft Decision Tree Regressor for Tabular Data". It is difficult to understand the differentiable decision tree. This is a good interpretable model and I want to use it to do something.
btw, I will also study postgraduate in nju next semester. I find you are my senior~
''' from sklearn.datasets import fetch_california_housing
# Load data
housing = fetch_california_housing()
xs = torch.from_numpy(housing["data"]).float()
ys = torch.from_numpy(housing["target"]).unsqueeze(1).float()
print(xs.size())
print(ys.size())
print(xs)
input_dim = xs.size()[1]
output_dim = ys.size()[1]
# Model and Optimizer
tree = SDT(input_dim, output_dim, depth, lamda, use_cuda)
optimizer = torch.optim.Adam(tree.parameters(),
lr=lr,
weight_decay=weight_decaly)
# Utils
best_testing_acc = 0.0
testing_acc_list = []
training_loss_list = []
criterion = nn.MSELoss()
device = torch.device("cuda" if use_cuda else "cpu")
output, penalty = tree.forward(xs, is_training_data=True)
for epoch in range(epochs):
# Training
tree.train()
output, penalty = tree.forward(xs, is_training_data=True)
print(output)
loss = criterion(output, ys.view(-1))
# loss += penalty
optimizer.zero_grad()
loss.backward()
optimizer.step()
# Print training status
pred = output.data.max(1)[1]
correct = pred.eq(ys.view(-1).data).sum()
msg = (
"Epoch: {:02d} | Loss: {:.5f} |"
" Correct: {:03d}/{:03d}"
)
print(msg.format(epoch, loss, correct, batch_size))
training_loss_list.append(loss.cpu().data.numpy())
'''
It looks like you are using the full batch training process (i.e., without using a dataloader that samples batches), maybe you should consider to use one and train SDT in a stochastic way. Besides, what is the value of learning rate and weight decay?