PS-VAE
PS-VAE copied to clipboard
Can PS deal with non-connected graph?
Great work! But I have some questions:
In ChEBI, some molecules are non-connected, such as
O=C1O[C@H]([C@H](O)CO)C([O-])=C1O.[Na+]
where Na+ is an isolated ion.
So I wonder if PS is able to deal with non-connected graph? I tried it and found an error when tokenizing. Thanks.
Hi, thanks for your interest in our work! I have updated the repo with the ability to process non-connected molecules on inference (commit 507c8f9 and 8c4b7b4). The basic logic is to split the non-connected smiles with '.' and treat each connected subgraph separately. However, I noticed some issues which should be taken care of:
- The codes for constructing vocabulary are not changed, so it is recommended to manually split the non-connected molecules and treat each subgraph as an independent molecule.
- Additional isolated ions needs to be manually added to the atomic vocabulary and to the element-SMILES conversion logic (see commit 507c8f9)
- If the non-connected molecule contains ions, the charge on the organic part might be lost after reconstruction. For example,
O=C1O[C@H]([C@H](O)CO)C([O-])=C1O.[Na+]
might becomeO=C1O[C@H]([C@H](O)CO)CO)=C1O.[Na+]
after reconstruction from the molecule object to the SMILES.
Thanks a lot!