pytorch-frame
pytorch-frame copied to clipboard
sklearn-compatible interface
I think it would be great to have this feature, as I think sklearn is often used for tabular data. I tried to use skorch, but skorch does not allow TensorFrames and did not work well.
(examples/tutorial.py)
from skorch import NeuralNetClassifier
net = NeuralNetClassifier(module=model, max_epochs=args.epochs, lr=args.lr,
device=device, batch_size=args.batch_size,
classes=dataset.num_classes, iterator_train=DataLoader,
iterator_valid=DataLoader, train_split=None)
net.fit(train_dataset, y=None)
Traceback (most recent call last):
File "\examples\tutorial.py", line 346, in <module>
net.fit(train_dataset, y=None)
File "\site-packages\skorch\classifier.py", line 165, in fit
return super(NeuralNetClassifier, self).fit(X, y, **fit_params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "\site-packages\skorch\net.py", line 1319, in fit
self.partial_fit(X, y, **fit_params)
File "\site-packages\skorch\net.py", line 1278, in partial_fit
self.fit_loop(X, y, **fit_params)
File "\site-packages\skorch\net.py", line 1190, in fit_loop
self.run_single_epoch(iterator_train, training=True, prefix="train",
File "\site-packages\skorch\net.py", line 1226, in run_single_epoch
step = step_fn(batch, **fit_params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "\site-packages\skorch\net.py", line 1105, in train_step
self._step_optimizer(step_fn)
File "\site-packages\skorch\net.py", line 1060, in _step_optimizer
optimizer.step(step_fn)
File "\site-packages\torch\optim\optimizer.py", line 373, in wrapper
out = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "\site-packages\torch\optim\optimizer.py", line 76, in _use_grad
ret = func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "\site-packages\torch\optim\sgd.py", line 66, in step
loss = closure()
^^^^^^^^^
File "\site-packages\skorch\net.py", line 1094, in step_fn
step = self.train_step_single(batch, **fit_params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "\site-packages\skorch\net.py", line 993, in train_step_single
y_pred = self.infer(Xi, **fit_params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "\site-packages\skorch\net.py", line 1517, in infer
x = to_tensor(x, device=self.device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "\site-packages\skorch\utils.py", line 104, in to_tensor
return [to_tensor_(x) for x in X]
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "\site-packages\skorch\utils.py", line 104, in <listcomp>
return [to_tensor_(x) for x in X]
^^^^^^^^^^^^^
File "\site-packages\skorch\utils.py", line 118, in to_tensor
raise TypeError("Cannot convert this data type to a torch tensor.")
TypeError: Cannot convert this data type to a torch tensor.
I think the following changes are needed:
- Add an ability to convert from DataFrame to TensorFrame without much prior information.
- Create a wrapper that passes Tensor to skorch or create a scikit-learn compatible estimator specifically for this package.
I am sorry, but I cannot take much time to assist in creating this feature, so if it is not possible, please close this.
You can convert a DataFrame to TensorFrame easily with
dataset = Dataset(df, col_to_stype=col_to_stype, target_col="y")
dataset.tensor_frame
See tutorial.
Thanks for your suggestion! I think this is great to add. Setting this as P2 feature, as we first want to prioritize more stype support https://github.com/pyg-team/pytorch-frame/issues/88.
Is someone already working on that?
No, as far as I know. Let us know if you are interested!
Yes, I'm interested. Hence, you can assign this to me. How fast should this task be completed?
@MacOS Great, thank you! It'd be good to complete this feature by the end of January. Would that be possible?
@weihua916 As of now, yes.
I have tried this and it seems to be very difficult. As a quick fix that isn't pretty, the following seems necessary:
- Patch
skorch.utils.to_tensor_to bypassTensorFrame. - Add
index = torch.sensor(index)totorch_frame.DataLoader.collapse_fnto make it returnTensorFrameinstead oflist[TensorFrame].
~However, I don't know how to pass the validation dataset.~
Next, we want to pass a validation dataset as well, but if we pass them using a tuple like skorch.NeuralNet.fit((train_dataset.tensorframe, val_dataset.tensor_frame), None), skorch would raise a lot of errors. Therefore, I tried to split them inside skorch.
- ~Pass
col_to_stypeasy, as inskorch.NeuralNet.fit(dataset.df, dataset.col_to_stype), utilizing the internal structure.~ - Remove
self.check_data(X, y)inskorch.NeuralNet.fit_loop(). - ~Modify
TensorFrameto callself.materialize()in the constructor.~ - To avoid an error in
torch_frame.Dataset.split(), set split_col likeskorch.NeuralNet(... , dataset=lambda d, c: Dataset(d, c, split_col='split_col')).
:thinking:
Thank you for looking into this, @34j! I was about to start working on it.
Add an ability to convert from DataFrame to TensorFrame without much prior information.
I would have simply converted the DataFrame to TensorFrame internally, work with it, and if requested, return the DataFrame again. This means, of course, that one has to track what was given. Or am I missing something?
Create a wrapper that passes Tensor to skorch or create a scikit-learn compatible estimator specifically for this package.
This seems to be very big and unrealistic because we would have to make all estimators compatible with scikit-learn, which is a lot to ask for. At the moment, scikit-learn is an optional dependency.
May I ask you, @34j, to post a self-contained example (or examples) that what qualify pytorch-frame as being sklearn-compatible?
PS: I would submit one PR today, but maybe only as a draft.
Add an ability to convert from DataFrame to TensorFrame without much prior information.
This is an implicit request for the recently implemented infer_df_stype, which has thankfully already been resolved.
Create a wrapper that passes Tensor to skorch
I feel like this could probably be done, I'll send a draft PR in an hour and I want to ask @MacOS to take it over and do the documentation, testing and tutorial work.
dirty prototype code
example/tutorial.py:
from skorch import NeuralNetClassifier
from skorch.dataset import Dataset as SkorchDataset
import torch.nn as nn
from torch_frame.utils import infer_df_stype
from torch_frame.data.dataset import DataFrameToTensorFrameConverter, Dataset
def create_dataset(df, _) -> Dataset:
dataset_ = Dataset(
df, dataset.col_to_stype, split_col="split_col", target_col="target_col"
)
dataset_.materialize()
return dataset_
def split_dataset(dataset: Dataset) -> tuple[SkorchDataset, SkorchDataset]:
datasets = dataset.split()[:2]
return datasets[0].tensor_frame, datasets[1].tensor_frame
def get_iterator(dataset: SkorchDataset, **kwargs) -> DataLoader:
return DataLoader2(dataset, **kwargs)
class DataLoader2(DataLoader):
def collate_fn(
self, index: int | List[int] | range | slice | Tensor
) -> tuple[TensorFrame, Tensor | None]:
index = torch.tensor(index)
res = super().collate_fn(index).to(device)
return res, res.y
net = NeuralNetClassifier(
module=model,
max_epochs=args.epochs,
lr=args.lr,
device=device,
batch_size=6,
iterator_train=get_iterator,
dataset=create_dataset,
iterator_valid=get_iterator,
train_split=split_dataset,
classes=dataset.df["target_col"].unique(),
verbose=1,
criterion=nn.CrossEntropyLoss,
)
net.fit(dataset.df, None)
@34j Is fine with me!
So we drop the second part of your request then, correct?
Heads up everyone, I started working on it. I already merge the PR draft of @34j into my fork.
Would be nice if you guys would be available in case I have questions. :)
Heads up everyone, I started working on it. I already merge the PR draft of @34j into my fork.
Would be nice if you guys would be available in case I have questions. :)
~May I ask you what is your question~ nvm plz, sorry for my terrible English comprehension
So far none. I meant just in case.
Sorry for the delay at all, but I had personal matters to deal with. I'm confident that I can submit a PR this month.
Hi all,
short update, unfortunately, I got sick, hence again a delay. Should I still work on it?
Hi all,
short update, unfortunately, I got sick, hence again a delay. Should I still work on it?
I think it should continue. Are you still working on this part? Otherwise I can take over.
Yes, still working on it @qychen2001!
Yes, still working on it @qychen2001!
That's great! This feature is really important, looking forward to your PR.
Sorry but I have almost completed this feature by myself in #375 (as MacOS seemed to be sick) and am just waiting for @weihua916 's review. However, the styling work for pre-commit by MacOS I referred certainly helped this.
Sorry but I have almost completed this feature by myself in #375 (as MacOS seemed to be sick) and am just waiting for @weihua916 's review. However, the styling work for pre-commit by MacOS I referred certainly helped this.
That's fantastic! But I'm still concerned about the relationship between skorch and sklearn, can your PR directly support models in sklearn such as svm?
Excuse me but what do you mean by relationship? skorch works perfectly, trust me plz 🫠
can your PR directly support models in sklearn such as svm?
sklearn models already have sklearn-compatible interface apparently