chemical_vae
chemical_vae copied to clipboard
How to train your VAE
If I have a new dataset, how can I use your code to train? It will be great if you could provide a procedure. Do you have any documentations on this code? Many thanks.
Hi,
I'll write up a more thorough procedure and fix some documentation later, but to answer your question quickly:
- Have your data prepared as a csv file. One of these columns should contain SMILES, the other should contain the properties you want to predict. This will be read later by mol_utils.load_smiles_and_data_df
- Create a json containing the characters found in the SMILES strings of this file. mol_utils.make_charset can help you do that.
- Copy the exp.json from models>zinc>exp.json, and change the data_file, char_file fields to match your experiment.
- Run chem.train_vae from the directory containing the exp.json.
Let me know if you have more questions.
On Thu, May 17, 2018 at 7:59 PM xuzhang5788 [email protected] wrote:
If I have a new dataset, how can I use your code to train? It will be great if you could provide a procedure. Do you have any documentations on this code? Many thanks.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_aspuru-2Dguzik-2Dgroup_chemical-5Fvae_issues_7&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=UPzYrSHLXjnX3tYn90C8Ljjzb-yfrb1UtMOxOFh-tKk&m=1VMbMCU2_EA69_JC4hXsIcBbk9UHRgy8Kvr0DIJj0mQ&s=c-YzYQ-5EFmeQvj57gY59sOwYCotUptx6tSlGTfCR8o&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AOe70FGC51S-2DuXIbAJ1NBbL1Srv4HRz7ks5tzg7lgaJpZM4UD8Eu&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=UPzYrSHLXjnX3tYn90C8Ljjzb-yfrb1UtMOxOFh-tKk&m=1VMbMCU2_EA69_JC4hXsIcBbk9UHRgy8Kvr0DIJj0mQ&s=CglKMTcnL6nGT0QtrKYuWptMdsMUgLW8ohFGUi3fJSM&e= .
Hi @jnwei I wonder is it possible to first feed a large dataset such as ZINC without the properties and then feed a much small one with the property such as the bioacitivity to a target. So that I can generate compounds with the bioactivity?
update
It seems that the issue has been created in #5, however no one replies.
Hi @jnwei I wonder is it possible to first feed a large dataset such as ZINC without the properties and then feed a much small one with the property such as the bioacitivity to a target. So that I can generate compounds with the bioactivity?
update
It seems that the issue has been created in #5, however no one replies.
Did you get to solve this?