keras-molecules icon indicating copy to clipboard operation
keras-molecules copied to clipboard

LogP calculations for 50k dataset

Open tantrev opened this issue 8 years ago • 4 comments

LogP calculations for the 50k dataset. Same order as original SMILES file.

tantrev avatar Nov 04 '16 02:11 tantrev

What's the origin of these? Are these measured logPs based on looking up the SMILES? Or cLogP from rdkit? Something else?

maxhodak avatar Nov 06 '16 00:11 maxhodak

Sorry, I should have specified. The LogP values were generated using ChemAxon's "generatemd" program from this repository's "smiles_50k.h5" file.

tantrev avatar Nov 06 '16 00:11 tantrev

What would be really great is if you could alter the smiles_50k.h5 file to include a clogp column with this data indexed to the right rows, instead of including this separately as a txt file. Then it would work with the --property_column parameter on the preprocessing script and the rest of the tooling here.

maxhodak avatar Nov 06 '16 14:11 maxhodak

I am curious about if it will be an issue to include data generated from ChemAxon. I mean any license issues.

hsiaoyi0504 avatar Nov 09 '16 12:11 hsiaoyi0504