PS-VAE icon indicating copy to clipboard operation
PS-VAE copied to clipboard

Can PS deal with non-connected graph?

Open Lyu6PosHao opened this issue 10 months ago • 2 comments

Great work! But I have some questions:

In ChEBI, some molecules are non-connected, such as O=C1O[C@H]([C@H](O)CO)C([O-])=C1O.[Na+] where Na+ is an isolated ion.

So I wonder if PS is able to deal with non-connected graph? I tried it and found an error when tokenizing. Thanks.

Lyu6PosHao avatar Apr 23 '24 12:04 Lyu6PosHao

Hi, thanks for your interest in our work! I have updated the repo with the ability to process non-connected molecules on inference (commit 507c8f9 and 8c4b7b4). The basic logic is to split the non-connected smiles with '.' and treat each connected subgraph separately. However, I noticed some issues which should be taken care of:

  1. The codes for constructing vocabulary are not changed, so it is recommended to manually split the non-connected molecules and treat each subgraph as an independent molecule.
  2. Additional isolated ions needs to be manually added to the atomic vocabulary and to the element-SMILES conversion logic (see commit 507c8f9)
  3. If the non-connected molecule contains ions, the charge on the organic part might be lost after reconstruction. For example, O=C1O[C@H]([C@H](O)CO)C([O-])=C1O.[Na+] might become O=C1O[C@H]([C@H](O)CO)CO)=C1O.[Na+] after reconstruction from the molecule object to the SMILES.

kxz18 avatar Apr 24 '24 03:04 kxz18

Thanks a lot!

Lyu6PosHao avatar Apr 25 '24 12:04 Lyu6PosHao