deep-molecular-optimization
deep-molecular-optimization copied to clipboard
generation step results in key error
When I try to run the generation step using the following command:
(molopt) user@machine:~/deep-molecular-optimization$ python generate.py --model-choice transformer --data-path data/chembl_02 --test-file-name test_unseen_L-1_S01_C10_range --model-path /home/user/Downloads/models/experiments/trained/Transformer/MMP/checkpoint/ --save-directory evaluation_transformer --epoch 60
I get the following output
15:37:01: generate.__init__ +31: INFO Namespace(batch_size=128, data_path='data/chembl_02', decode_type='multinomial', epoch=60, model_choice='transformer', model_path='/home/user/Downloads/models/experiments/trained/Transformer/MMP/checkpoint/', num_samples=10, save_directory='evaluation_transformer', test_file_name='test_unseen_L-1_S01_C10_range')
15:37:01: generate.__init__ +32: INFO Save directory: experiments/evaluation_transformer/test_unseen_L-1_S01_C10_range/evaluation_60
Allocating cuda:0.
/home/user/deep-molecular-optimization/models/transformer/encode_decode/model.py:59: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.
nn.init.xavier_uniform(p)
0%| | 0/62 [00:09<?, ?it/s]
Traceback (most recent call last):
File "generate.py", line 252, in <module>
run_main()
File "generate.py", line 248, in run_main
runner.generate(opt)
File "generate.py", line 92, in generate
device=device)
File "generate.py", line 169, in sample
smi = self.tokenizer.untokenize(self.vocab.decode(seq.cpu().numpy()))
File "/home/user/deep-molecular-optimization/preprocess/vocabulary.py", line 66, in decode
tokens.append(self[ohv])
File "/home/user/deep-molecular-optimization/preprocess/vocabulary.py", line 23, in __getitem__
return self._tokens[token_or_id]
KeyError: 121
If I run this a number of times, the key it can't locate changes every time. Am I running this incorrectly?
I've run into the same error. I tried to wrap this with an exception handler at line 117 in generate.py
try: smi = self.tokenizer.untokenize(self.vocab.decode(seq.cpu().numpy())) smi = uc.get_canonical_smile(smi) smiles.append(smi) except KeyError: print("Key Error")
However, this doesn't appear to have helped. The output only had blank values for TargetMol. Also, the README appears to be out of data, it lists an argument "--vocab-path" to generate.py that doesn't appear to be supported. Can you post a working example?
Hi @spadavec, I think it might be because of the inconsistency of the vocab and model checkpoint. Note there are two branches for two different publications, where the pre-trained models and vocabs cannot be crossly used. For example, if you are using the generation step from master branch, you can't use the model downloaded from the "general_transformation" branch (based on the model path you provided, I guess you are using) for generation.
Hi @PatWalters, can I ask which branch you are using and are you using the model and vocab downloaded from the link residing in that branch correspondingly? Also the "--vocab-path" is only supported on the branch "general_transformation".
my workflow that worked for me to run the generate.py for modifying some smiles strings:
I cloned the repository -> then checkout the branch "general_transformation" -> download the model.tar.gz and data.tar.gz from the https://zenodo.org/record/6319821#.YwSlnnbP2gY (VERSION 2!) -> extracted the two files into the folder "deep-molecular-optimization" -> then run:
!python generate.py --model-choice transformer \
--data-path "/home/mpr/2_git_repos/deep-molecular-optimization/data/MMP" \
# model where the csv file "small.csv" is located which is a small set of smiles
--test-file-name "small" \
#name of the csv-file
--epoch 60 \
--save-directory "/home/mpr/2_git_repos/deep-molecular-optimization/experiments/evaluation_transformer" \
--model-path "/home/mpr/2_git_repos/deep-molecular-optimization/experiments/trained/Transformer-U/MMP/checkpoint" \
--vocab-path "/home/mpr/2_git_repos/deep-molecular-optimization/data/MMP/vocab.pkl"\
# !Important to match the vocab.pkl of the right model -> in this case I used MMP
--without-property
# Because i didn't need the property prediction part`
This worked very well for me
Thank you for this nice tool!