deep-molecular-optimization icon indicating copy to clipboard operation
deep-molecular-optimization copied to clipboard

generation step results in key error

Open spadavec opened this issue 2 years ago • 4 comments

When I try to run the generation step using the following command:

(molopt) user@machine:~/deep-molecular-optimization$ python generate.py --model-choice transformer --data-path data/chembl_02 --test-file-name test_unseen_L-1_S01_C10_range --model-path /home/user/Downloads/models/experiments/trained/Transformer/MMP/checkpoint/ --save-directory evaluation_transformer --epoch 60

I get the following output

15:37:01: generate.__init__ +31: INFO     Namespace(batch_size=128, data_path='data/chembl_02', decode_type='multinomial', epoch=60, model_choice='transformer', model_path='/home/user/Downloads/models/experiments/trained/Transformer/MMP/checkpoint/', num_samples=10, save_directory='evaluation_transformer', test_file_name='test_unseen_L-1_S01_C10_range')
15:37:01: generate.__init__ +32: INFO     Save directory: experiments/evaluation_transformer/test_unseen_L-1_S01_C10_range/evaluation_60
Allocating cuda:0.

/home/user/deep-molecular-optimization/models/transformer/encode_decode/model.py:59: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.
  nn.init.xavier_uniform(p)
  0%|                                                                                                                                                                                                         | 0/62 [00:09<?, ?it/s]
Traceback (most recent call last):
  File "generate.py", line 252, in <module>
    run_main()
  File "generate.py", line 248, in run_main
    runner.generate(opt)
  File "generate.py", line 92, in generate
    device=device)
  File "generate.py", line 169, in sample
    smi = self.tokenizer.untokenize(self.vocab.decode(seq.cpu().numpy()))
  File "/home/user/deep-molecular-optimization/preprocess/vocabulary.py", line 66, in decode
    tokens.append(self[ohv])
  File "/home/user/deep-molecular-optimization/preprocess/vocabulary.py", line 23, in __getitem__
    return self._tokens[token_or_id]
KeyError: 121

If I run this a number of times, the key it can't locate changes every time. Am I running this incorrectly?

spadavec avatar Mar 30 '22 19:03 spadavec

I've run into the same error. I tried to wrap this with an exception handler at line 117 in generate.py

try:
    smi = self.tokenizer.untokenize(self.vocab.decode(seq.cpu().numpy()))
    smi = uc.get_canonical_smile(smi)
    smiles.append(smi)
except KeyError:
    print("Key Error")

However, this doesn't appear to have helped. The output only had blank values for TargetMol. Also, the README appears to be out of data, it lists an argument "--vocab-path" to generate.py that doesn't appear to be supported. Can you post a working example?

PatWalters avatar Apr 08 '22 19:04 PatWalters

Hi @spadavec, I think it might be because of the inconsistency of the vocab and model checkpoint. Note there are two branches for two different publications, where the pre-trained models and vocabs cannot be crossly used. For example, if you are using the generation step from master branch, you can't use the model downloaded from the "general_transformation" branch (based on the model path you provided, I guess you are using) for generation.

jiazhenhe avatar Aug 01 '22 12:08 jiazhenhe

Hi @PatWalters, can I ask which branch you are using and are you using the model and vocab downloaded from the link residing in that branch correspondingly? Also the "--vocab-path" is only supported on the branch "general_transformation".

jiazhenhe avatar Aug 01 '22 12:08 jiazhenhe

my workflow that worked for me to run the generate.py for modifying some smiles strings: I cloned the repository -> then checkout the branch "general_transformation" -> download the model.tar.gz and data.tar.gz from the https://zenodo.org/record/6319821#.YwSlnnbP2gY (VERSION 2!) -> extracted the two files into the folder "deep-molecular-optimization" -> then run: !python generate.py --model-choice transformer \ --data-path "/home/mpr/2_git_repos/deep-molecular-optimization/data/MMP" \ # model where the csv file "small.csv" is located which is a small set of smiles --test-file-name "small" \ #name of the csv-file --epoch 60 \ --save-directory "/home/mpr/2_git_repos/deep-molecular-optimization/experiments/evaluation_transformer" \ --model-path "/home/mpr/2_git_repos/deep-molecular-optimization/experiments/trained/Transformer-U/MMP/checkpoint" \ --vocab-path "/home/mpr/2_git_repos/deep-molecular-optimization/data/MMP/vocab.pkl"\ # !Important to match the vocab.pkl of the right model -> in this case I used MMP --without-property # Because i didn't need the property prediction part`

               This worked very well for me
               Thank you for this nice tool!

mpriessner avatar Aug 24 '22 12:08 mpriessner