torchdrug ValueError: Fail to parse the docstring of `Smol`. Inconsistent number of parameters in signature and docstring.

Trying to build a customized dataset as follows for the molecular generation task.

@R.register("datasets.Smol")

@doc.copy_args(data.MoleculeDataset.load_csv, ignore=("smiles_field", "target_fields"))

class Smol(data.MoleculeDataset):

  smiles_file = "/content/drive/MyDrive/molecule_design/resources/smiles_train.csv"
  target_fields = ["SPLIT"]

  def __init__(self, smiles_file, verbose=1, **kwargs):
    self.load_csv(self.smiles_file, smiles_field="smiles", target_fields=self.target_fields,lazy=True,
                      verbose=verbose, **kwargs)
    
  def split(self):
    indexes = defaultdict(list)
    for i, split in enumerate(self.targets["SPLIT"]):
        indexes[split].append(i)
    train_set = torch_data.Subset(self, indexes["train"])
    valid_set = torch_data.Subset(self, indexes["valid"])
    test_set = torch_data.Subset(self, indexes["test"])
    return train_set, valid_set, test_set

but get the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-29-ada51874abcc>](https://localhost:8080/#) in <module>()
      3 @doc.copy_args(data.MoleculeDataset.load_csv, ignore=("smiles_field", "target_fields"))
      4 
----> 5 class Smol(data.MoleculeDataset):
      6 
      7   smiles_file = "/content/drive/MyDrive/molecule_design/resources/smiles_train.csv"

[/usr/local/lib/python3.7/dist-packages/torchdrug/utils/doc.py](https://localhost:8080/#) in wrapper(obj)
     90         if len(docs) != len(parameters):
     91             raise ValueError("Fail to parse the docstring of `%s`. "
---> 92                              "Inconsistent number of parameters in signature and docstring." % obj.__name__)
     93         new_params = []
     94         new_docs = []

ValueError: Fail to parse the docstring of `Smol`. Inconsistent number of parameters in signature and docstring.

Did I miss something?

Aug 13 '22 15:08 CaiYitao

Hi! The decorator @doc.copy_args is only used for auto-filling the **kwargs part in the docstring. If your class doesn't have a valid docstring, please just remove the decorator.

Aug 14 '22 22:08 KiddoZhu

Just tried it, works now! Thanks for your kind reply!

Aug 15 '22 15:08 CaiYitao

Trying to build a customized dataset as follows for the molecular generation task.

@R.register("datasets.Smol")

@doc.copy_args(data.MoleculeDataset.load_csv, ignore=("smiles_field", "target_fields"))

class Smol(data.MoleculeDataset):

  smiles_file = "/content/drive/MyDrive/molecule_design/resources/smiles_train.csv"
  target_fields = ["SPLIT"]

  def __init__(self, smiles_file, verbose=1, **kwargs):
    self.load_csv(self.smiles_file, smiles_field="smiles", target_fields=self.target_fields,lazy=True,
                      verbose=verbose, **kwargs)
    
  def split(self):
    indexes = defaultdict(list)
    for i, split in enumerate(self.targets["SPLIT"]):
        indexes[split].append(i)
    train_set = torch_data.Subset(self, indexes["train"])
    valid_set = torch_data.Subset(self, indexes["valid"])
    test_set = torch_data.Subset(self, indexes["test"])
    return train_set, valid_set, test_set

but get the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-29-ada51874abcc>](https://localhost:8080/#) in <module>()
      3 @doc.copy_args(data.MoleculeDataset.load_csv, ignore=("smiles_field", "target_fields"))
      4 
----> 5 class Smol(data.MoleculeDataset):
      6 
      7   smiles_file = "/content/drive/MyDrive/molecule_design/resources/smiles_train.csv"

[/usr/local/lib/python3.7/dist-packages/torchdrug/utils/doc.py](https://localhost:8080/#) in wrapper(obj)
     90         if len(docs) != len(parameters):
     91             raise ValueError("Fail to parse the docstring of `%s`. "
---> 92                              "Inconsistent number of parameters in signature and docstring." % obj.__name__)
     93         new_params = []
     94         new_docs = []

ValueError: Fail to parse the docstring of `Smol`. Inconsistent number of parameters in signature and docstring.

Did I miss something?

now there is another problem coming out:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-9-c7e661af9d82> in <module>
      1 optimizer = optim.Adam(task.parameters(), lr = 1e-3)
      2 solver = core.Engine(task, train_set, valid_set, test_set, optimizer,
----> 3                      gpus=(0,), batch_size=128, log_interval=10)
<decorator-gen-126> in __init__(self, task, train_set, valid_set, test_set, optimizer, scheduler, gpus, batch_size, gradient_interval, num_worker, logger, log_interval)

2 frames
/usr/local/lib/python3.7/dist-packages/torchdrug/core/core.py in wrapper(init, self, *args, **kwargs)
    286                 config.pop(k)
    287             self._config = dict(config)
--> 288             return init(self, *args, **kwargs)
    289 
    290         def get_function(method):

/usr/local/lib/python3.7/dist-packages/torchdrug/core/engine.py in __init__(self, task, train_set, valid_set, test_set, optimizer, scheduler, gpus, batch_size, gradient_interval, num_worker, logger, log_interval)
     90             # handle dynamic parameters in optimizer
     91             old_params = list(task.parameters())
---> 92             result = task.preprocess(train_set, valid_set, test_set)
     93             if result is not None:
     94                 train_set, valid_set, test_set = result

/usr/local/lib/python3.7/dist-packages/torchdrug/tasks/generation.py in preprocess(self, train_set, valid_set, test_set)
    667         Compute ``max_edge_unroll`` and ``max_node`` on the training set if not provided.
    668         """
--> 669         remap_atom_type = transforms.RemapAtomType(train_set.atom_types)
    670         train_set.transform = transforms.Compose([
    671             train_set.transform,

AttributeError: 'Subset' object has no attribute 'atom_types'

I also tried to use core.PackedMolecules.from_smiles (smiles) it also does not have antom_types.

Should I divide the dataset first and save to train.csv, valid.csv, test.csv?
UPDATE: this was solved by divide the data into train.csv, valid.csv, test.csv and load them into dataset class.

when call solver.train(n_epochs=1) for model.RGCN, it would issue the following error. however, when training the model GraphAF, it would train for like over 12 hours (for around 1,000,000 smiles) for like 1 epoch?

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_1607319/1620135592.py in <module>
----> 1 solver.train(num_epoch=2)

~/.local/lib/python3.9/site-packages/torchdrug/core/engine.py in train(self, num_epoch, batch_per_epoch)
    153                     batch = utils.cuda(batch, device=self.device)
    154 
--> 155                 loss, metric = model(batch)
    156                 if not loss.requires_grad:
    157                     raise RuntimeError("Loss doesn't require grad. Did you define any loss in the task?")

/system/apps/userenv/students/yitaocai/conda/envs/system2/lib/python3.9/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

~/.local/lib/python3.9/site-packages/torchdrug/tasks/generation.py in forward(self, batch)
    696         for criterion, weight in self.criterion.items():
    697             if criterion == "nll":
--> 698                 _loss, _metric = self.MLE_forward(batch)
    699                 all_loss += _loss * weight
    700                 metric.update(_metric)
...
---> 72     return torch.ops.torch_scatter.scatter_max(src, index, dim, out, dim_size)
     73 
     74 

RuntimeError: Not compiled with CUDA support

Aug 18 '22 10:08 CaiYitao

For generative models, since there is no need to use validation and test sets. You may pass the full dataset as the training set, and set None for both validation and test sets. See the demo of GCPN / GraphAF.

GraphAF might take a long time to train, since it is an autoregressive model, though I can't recall the exact speed of GraphAF. For the CUDA support problem, it is because you install the CPU version rather than the GPU version of torch_scatter.

Aug 20 '22 19:08 KiddoZhu