ValueError: Fail to parse the docstring of `Smol`. Inconsistent number of parameters in signature and docstring.
Trying to build a customized dataset as follows for the molecular generation task.
@R.register("datasets.Smol")
@doc.copy_args(data.MoleculeDataset.load_csv, ignore=("smiles_field", "target_fields"))
class Smol(data.MoleculeDataset):
smiles_file = "/content/drive/MyDrive/molecule_design/resources/smiles_train.csv"
target_fields = ["SPLIT"]
def __init__(self, smiles_file, verbose=1, **kwargs):
self.load_csv(self.smiles_file, smiles_field="smiles", target_fields=self.target_fields,lazy=True,
verbose=verbose, **kwargs)
def split(self):
indexes = defaultdict(list)
for i, split in enumerate(self.targets["SPLIT"]):
indexes[split].append(i)
train_set = torch_data.Subset(self, indexes["train"])
valid_set = torch_data.Subset(self, indexes["valid"])
test_set = torch_data.Subset(self, indexes["test"])
return train_set, valid_set, test_set
but get the following error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
[<ipython-input-29-ada51874abcc>](https://localhost:8080/#) in <module>()
3 @doc.copy_args(data.MoleculeDataset.load_csv, ignore=("smiles_field", "target_fields"))
4
----> 5 class Smol(data.MoleculeDataset):
6
7 smiles_file = "/content/drive/MyDrive/molecule_design/resources/smiles_train.csv"
[/usr/local/lib/python3.7/dist-packages/torchdrug/utils/doc.py](https://localhost:8080/#) in wrapper(obj)
90 if len(docs) != len(parameters):
91 raise ValueError("Fail to parse the docstring of `%s`. "
---> 92 "Inconsistent number of parameters in signature and docstring." % obj.__name__)
93 new_params = []
94 new_docs = []
ValueError: Fail to parse the docstring of `Smol`. Inconsistent number of parameters in signature and docstring.
Did I miss something?
Hi! The decorator @doc.copy_args is only used for auto-filling the **kwargs part in the docstring. If your class doesn't have a valid docstring, please just remove the decorator.
Just tried it, works now! Thanks for your kind reply!
Trying to build a customized dataset as follows for the molecular generation task.
@R.register("datasets.Smol") @doc.copy_args(data.MoleculeDataset.load_csv, ignore=("smiles_field", "target_fields")) class Smol(data.MoleculeDataset): smiles_file = "/content/drive/MyDrive/molecule_design/resources/smiles_train.csv" target_fields = ["SPLIT"] def __init__(self, smiles_file, verbose=1, **kwargs): self.load_csv(self.smiles_file, smiles_field="smiles", target_fields=self.target_fields,lazy=True, verbose=verbose, **kwargs) def split(self): indexes = defaultdict(list) for i, split in enumerate(self.targets["SPLIT"]): indexes[split].append(i) train_set = torch_data.Subset(self, indexes["train"]) valid_set = torch_data.Subset(self, indexes["valid"]) test_set = torch_data.Subset(self, indexes["test"]) return train_set, valid_set, test_setbut get the following error:
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) [<ipython-input-29-ada51874abcc>](https://localhost:8080/#) in <module>() 3 @doc.copy_args(data.MoleculeDataset.load_csv, ignore=("smiles_field", "target_fields")) 4 ----> 5 class Smol(data.MoleculeDataset): 6 7 smiles_file = "/content/drive/MyDrive/molecule_design/resources/smiles_train.csv" [/usr/local/lib/python3.7/dist-packages/torchdrug/utils/doc.py](https://localhost:8080/#) in wrapper(obj) 90 if len(docs) != len(parameters): 91 raise ValueError("Fail to parse the docstring of `%s`. " ---> 92 "Inconsistent number of parameters in signature and docstring." % obj.__name__) 93 new_params = [] 94 new_docs = [] ValueError: Fail to parse the docstring of `Smol`. Inconsistent number of parameters in signature and docstring.Did I miss something?
now there is another problem coming out:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-9-c7e661af9d82> in <module>
1 optimizer = optim.Adam(task.parameters(), lr = 1e-3)
2 solver = core.Engine(task, train_set, valid_set, test_set, optimizer,
----> 3 gpus=(0,), batch_size=128, log_interval=10)
<decorator-gen-126> in __init__(self, task, train_set, valid_set, test_set, optimizer, scheduler, gpus, batch_size, gradient_interval, num_worker, logger, log_interval)
2 frames
/usr/local/lib/python3.7/dist-packages/torchdrug/core/core.py in wrapper(init, self, *args, **kwargs)
286 config.pop(k)
287 self._config = dict(config)
--> 288 return init(self, *args, **kwargs)
289
290 def get_function(method):
/usr/local/lib/python3.7/dist-packages/torchdrug/core/engine.py in __init__(self, task, train_set, valid_set, test_set, optimizer, scheduler, gpus, batch_size, gradient_interval, num_worker, logger, log_interval)
90 # handle dynamic parameters in optimizer
91 old_params = list(task.parameters())
---> 92 result = task.preprocess(train_set, valid_set, test_set)
93 if result is not None:
94 train_set, valid_set, test_set = result
/usr/local/lib/python3.7/dist-packages/torchdrug/tasks/generation.py in preprocess(self, train_set, valid_set, test_set)
667 Compute ``max_edge_unroll`` and ``max_node`` on the training set if not provided.
668 """
--> 669 remap_atom_type = transforms.RemapAtomType(train_set.atom_types)
670 train_set.transform = transforms.Compose([
671 train_set.transform,
AttributeError: 'Subset' object has no attribute 'atom_types'
I also tried to use core.PackedMolecules.from_smiles (smiles) it also does not have antom_types.
Should I divide the dataset first and save to train.csv, valid.csv, test.csv?
UPDATE: this was solved by divide the data into train.csv, valid.csv, test.csv and load them into dataset class.
when call solver.train(n_epochs=1) for model.RGCN, it would issue the following error. however, when training the model GraphAF, it would train for like over 12 hours (for around 1,000,000 smiles) for like 1 epoch?
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
/tmp/ipykernel_1607319/1620135592.py in <module>
----> 1 solver.train(num_epoch=2)
~/.local/lib/python3.9/site-packages/torchdrug/core/engine.py in train(self, num_epoch, batch_per_epoch)
153 batch = utils.cuda(batch, device=self.device)
154
--> 155 loss, metric = model(batch)
156 if not loss.requires_grad:
157 raise RuntimeError("Loss doesn't require grad. Did you define any loss in the task?")
/system/apps/userenv/students/yitaocai/conda/envs/system2/lib/python3.9/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1049 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1050 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051 return forward_call(*input, **kwargs)
1052 # Do not call functions when jit is used
1053 full_backward_hooks, non_full_backward_hooks = [], []
~/.local/lib/python3.9/site-packages/torchdrug/tasks/generation.py in forward(self, batch)
696 for criterion, weight in self.criterion.items():
697 if criterion == "nll":
--> 698 _loss, _metric = self.MLE_forward(batch)
699 all_loss += _loss * weight
700 metric.update(_metric)
...
---> 72 return torch.ops.torch_scatter.scatter_max(src, index, dim, out, dim_size)
73
74
RuntimeError: Not compiled with CUDA support
For generative models, since there is no need to use validation and test sets. You may pass the full dataset as the training set, and set None for both validation and test sets. See the demo of GCPN / GraphAF.
GraphAF might take a long time to train, since it is an autoregressive model, though I can't recall the exact speed of GraphAF. For the CUDA support problem, it is because you install the CPU version rather than the GPU version of torch_scatter.