GraphAF
GraphAF copied to clipboard
Benchmark comparison.
Thank you for making your code available, it has been a great help for our project work!
Would you consider adding molecular property distribution plots to the README? During our work with the MOSES dataset, we found the distribution of molecules from GraphAF trained on MOSES differ significantly from the training data, at least on these properties. See the plot below.
Properties of generated molecules
Admittedly this is not the Zinc250k dataset. It is not addressed in your paper, so I just wanted to make you aware of this possible issue.
The GraphAF implementation also resamples in case a molecule has less than 5 atoms, and we found this happened very frequently (>50% of the time). There are no molecules under this size in your training data.
The molecules were in general very small (e.g. see the weight distribution), which means "validity w/o check" gets high due to there being less chance of errors. E.g. think of a language model producing short words, then there is less of a chance of spelling mistakes, if that makes sense. This also contributes to a high novelty.
It makes me feel as if validity, novelty and uniqueness are not as useful measures when it comes to capturing the performance of molecular generative models, since these can all be increased without fitting the training data "better". I.e. Low validity implies bad fit. High validity does not necessarily imply "good" fit.
Of course this can probably be said about most metrics, and it is limited what can be included in the paper. But distribution plots really do say a thousand words in this area.
Again, thanks for your work! :)