keras-molecules icon indicating copy to clipboard operation
keras-molecules copied to clipboard

Reproduce GDB-17 construction pipeline

Open dakoner opened this issue 7 years ago • 4 comments

I think this is probably out of scope for this project but I think this may be the right community to target.

I read the GDB-17 construction paper (http://pubs.acs.org/doi/abs/10.1021/ci300415d). GDB-17 explicitly enumerates a subset of the total chemical space of molecules with up to 17 nuclei.

GDB-17 itself is not available in total, although the paper says that a sample of 1-2M entries is available and is sufficient for any training purposes.

The paper describes the construction process in enough detail to reproduce (although it would take effort), but does not provide code to do so. A number of the construction steps are fairly subtle (particularly with respect to rings, conjugation and aromatics) and there are a few "arbitrary" pruning choices that I'm not happy with. I believe a simple distributed pipeline could produce GDB-17 or an even better library automatically. Once the pipeline is built, larger GDBs could be constructed easily.

The challenge here is parsing what they describe in the methods section and converting that to code. They didn't do a good job of describing their methods in detail.

dakoner avatar Nov 14 '16 14:11 dakoner

Hi @dakoner. I believe your request is out of scope of this project (although I'm not the author or contributor to this code), but I'd be very interested in writing a code that could generate GDB-17 or GDB-N in general. If you are interested, please get in touch with me.

mnowotka avatar Nov 15 '16 22:11 mnowotka

I'd be interested to recapitulate GDB-n as well.

pechersky avatar Nov 17 '16 13:11 pechersky

I'd be interested to recapitulate GDB-n as well, but as @mnowotka mentioned, I think another repo for this will be better.

hsiaoyi0504 avatar Dec 06 '16 12:12 hsiaoyi0504

I think this is a great idea.. and would be interested in helping or supporting..

Tipizen avatar Dec 24 '16 15:12 Tipizen