Ensemble-Pytorch icon indicating copy to clipboard operation
Ensemble-Pytorch copied to clipboard

Ensemble for PyTorch Geometric

Open ParasKoundal opened this issue 3 years ago • 8 comments

Hi, I want to use Ensemble-PyTorch with PyTorch-Geometric. However, it doesn't recognize the dataloaders.

Is this under development or a bug.

ParasKoundal avatar Jan 01 '22 21:01 ParasKoundal

Hi @ParasKoundal, could you provide the code snippet on using dataloaders with graph data, so that we can take a closer look.

xuyxu avatar Jan 03 '22 04:01 xuyxu

@xuyxu It is simple.

.....
from torch_geometric.loader import DataLoader
.......

train_loader = DataLoader(train_dataset, batch_size= batch_s, shuffle=True,drop_last=True)
val_loader = DataLoader(val_dataset, batch_size=2,drop_last=True)
test_loader = DataLoader(test_dataset, batch_size=2,drop_last=True)
.......

I have created a custom class to preprocess dataset before loading into dataloader.

After that I was trying as given in https://ensemble-pytorch.readthedocs.io/en/latest/quick_start.html. For regression I tried initially with VotingRegressor, doesn't work (error given in the initial issue raised). Similar with others too.

ParasKoundal avatar Jan 03 '22 08:01 ParasKoundal

Could you further provide the full exception traceback, thanks!

xuyxu avatar Jan 03 '22 11:01 xuyxu

@xuyxu

Here's that

test_loader=test_loader
  File "/cr/data02/koundal/applications/gpu-project/lib/python3.7/site-packages/torchensemble/bagging.py", line 329, in fit
    self.n_outputs = self._decide_n_outputs(train_loader)
  File "/cr/data02/koundal/applications/gpu-project/lib/python3.7/site-packages/torchensemble/_base.py", line 267, in _decide_n_outputs
    _, target = split_data_target(elem, self.device)
  File "/cr/data02/koundal/applications/gpu-project/lib/python3.7/site-packages/torchensemble/utils/io.py", line 84, in split_data_target
    raise ValueError(msg)
ValueError: Invalid dataloader, please check if the input dataloder is valid.

ParasKoundal avatar Jan 05 '22 23:01 ParasKoundal

This could possibly be the side-effect of the commit from issue #75. Will see if this could be fixed in a few days, thanks for reporting @ParasKoundal !

xuyxu avatar Jan 06 '22 05:01 xuyxu

@xuyxu Any update on this?

ParasKoundal avatar Jan 12 '22 12:01 ParasKoundal

Hi @ParasKoundal, sorry, I am kind of busy these days, and will take a look during the next weekend.

xuyxu avatar Jan 12 '22 12:01 xuyxu

In torchensemble, at each iteration the input loader is expected to return a list in the following forms:

  • [data_tensor, target_tensor]
  • [data_tensor_1, data_tensor_2, ..., target_tensor]

The first kind of form is the most widely-used form of the dataloader (i.e., for batch_idx, (data, target) in enumerate(loader)), while the second one comes from the feature request from #75 to support multiple input tensors.

However, the dataloder in pytorch geometric conforms to neither of them:

  • (positive_batch, negative_batch)

which does not contain a target tensor since the label is simply the index of the batch in the tuple returned.

Here is a simple solution, please let me know if it solves your problem on using torchensmeble models in pytorch geometric. The general idea is to override the _sample method. Taking metapath2vec as an example, we could declare a new class like:

from torch_geometric.nn import MetaPath2Vec

class CustomMetaPath2Vec(MetaPath2Vec):

    def _sample(self, batch: List[int]) -> Tuple[Tensor, Tensor]:
        if not isinstance(batch, Tensor):
            batch = torch.tensor(batch, dtype=torch.long)
        
        pos_sample = self._pos_sample(batch)
        neg_sample = self._neg_sample(batch)
        
        data = torch.cat((pos_sample, neg_sample), dim=0)
        target = torch.cat(
            (torch.ones(pos_sample.size(0)), torch.zeros(neg_sample.size(0))),
            dim=0
        )

        return [data, target]

Using this new class, the positive_batch and negative_batch will be concatenated as one tensor data, and you can identify them via the target tensor.

In addition, some extra steps are required in the forward function of downstream base estimators.

Looking forward to your kind reply @ParasKoundal

xuyxu avatar Jan 16 '22 05:01 xuyxu