ORGAN
ORGAN copied to clipboard
When running,I have an error
When running,I have an error:
Traceback (most recent call last):
File "example.py", line 9, in
Traceback (most recent call last):
File "example.py", line 18, in
Hi @gaojunhui68,
Both errors come (obviously) from different sources. Particularly, the first is using the molecular metrics, and the second is using the music metrics.
In both cases, it looks like you are using different training sets in the run and the checkpoint, so the engine is unable to decode (because the internal dictionary does not recognize the features. If you give me more information (i. e., the actual file that you run), I'll be able to give you more information.
Regards, Carlos
Hi @couteiral,
Yes, the first is using the molecular metrics, and the second is using the music metrics.
For the first, the code of example.py is in bellow:
import organ from organ import ORGAN model = ORGAN('test', 'mol_metrics', params={'PRETRAIN_DIS_EPOCHS': 1}) model.load_training_set('data/toy.csv') model.set_training_program(['novelty'], [1]) model.load_metrics() model.train(ckpt_dir='ckpt')
For the second , the code of example.py is in bellow:
from organ import ORGAN
model = ORGAN('test', 'music_metrics') # Loads a ORGANIC with name 'test', using music metrics model.load_training_set('data/music_small.txt') # Loads the training set model.set_training_program(['tonality'], [50]) # Sets the training program as 50 epochs with the tonality metric model.load_metrics() # Loads all the metrics model.train() # Proceeds with the training
In both cases ,this error occurs in train, after pretrain.
Please help me . Thanks, Junhui Gao
Hi @gaojunhui68,
First, the music metrics seem to be bugged. I am afraid I didn't work on them myself, but I'll get in touch with someone involved, and get back to you.
Regarding the molecular metrics, you get a KeyError, which is the error that a Python dictionary raises when a key not in the dictionary is requested. The following is happening: when you try to decode the embedding coordinates to SMILES strings, you are passing the wrong value to the dictionary, and the code crashes.
In particular, you are passing 'O' to the ord_dict, which is the dictionary containing the mapping from the embedding to the SMILES strings, so something is wrong in there. However, I just ran exactly the same code from the actual repo, and I could not found any problem like yours.
Could you share your pretraining files, so I can have a look at them? Also, are you sure that there is nothing wrong with your 'toy.csv' training set?
Cheers, Carlos
Hi @couteiral,
I had the same error as @gaojunhui68. What I did was:
git clone https://github.com/gablg1/ORGAN.git
pip install -r requirements.txt
python example.py
The error message was
Traceback (most recent call last):
File "example.py", line 8, in <module>
model.train(ckpt_dir='ckpt')
File "/home/yoshikawa/ORGAN/organ/__init__.py", line 763, in train
gen_samples, self.train_samples, self.ord_dict, results)
File "/home/yoshikawa/ORGAN/organ/mol_metrics.py", line 183, in compute_results
results[objective] = np.mean(reward(verified_samples, train_data))
File "/home/yoshikawa/ORGAN/organ/__init__.py", line 743, in batch_reward
for sample in samples]
File "/home/yoshikawa/ORGAN/organ/mol_metrics.py", line 115, in decode
''.join([ord_dict[o] for o in ords]))
KeyError: 'O'
I used pyenv. Both anaconda3-5.0.0 and anaconda2-5.0.0 did not work.
Hi @couteiral,
Any update for the previous posts? The error message I got is KeyError: 'N'
I was also running the example.py. It happened at "model.train(ckpt_dir='ckpt')", after finished the pre-training. It looks like it was happened at the same spot like the KeyError: 'O'.
Traceback (most recent call last):
File "
Is there anyway to pin point which smiles made the problem? Please let me know if you need any further details.
Thanks in advance! Toushi
It looks like this issue is related to the data set. For the toy set, I found there are > 30 entries with empty smiles, i.e. have NumAtom, Name, but the smiles column are empty. From there I further refined the data set with rdkit. With all these trials, I got different KeyError(s), 'C', '[', 'O'. This means the data set still has something wrong! Or a filter is needed before processing the data just like "ORGANIC" does.
Hi @couteiral,
I found that the error is due to the function mm.decode(ords, rod_dict). The ords may be string or list. Thats why it will occurs an error.
I think that there should be 2 different decode() functions.
Could you run the code use your music_small.txt dataset again to help to fix it?
Thanks.
Hi @couterial,
Have a look of some simple debug. If I insert a print in the decode, like this:
def decode(ords, ord_dict): print (ords) # check return unpad(''.join([ord_dict[o] for o in ords]))
Here are the last few lines printed out before it crashes: ......... [11 1 2 1 1 2 1 1 2 1 4 11 7 21 21 21] [ 1 2 11 1 1 9 11 10 2 11 21 21 21 21 21 21] [ 1 2 1 1 9 8 10 8 21 21 21 21 21 21 21 21] [11 2 1 1 1 3 4 3 9 7 8 10 6 4 21 21] [ 8 1 7 3 4 5 5 3 5 6 4 21 21 21 21 21] N#CC(O)F Traceback (most recent call last): File "/home/trial3/ORGAN/organ/init.py", line 763, in train gen_samples, self.train_samples, self.ord_dict, results) File "/home/trial3/ORGAN/organ/mol_metrics.py", line 185, in compute_results results[objective] = np.mean(reward(verified_samples, train_data)) File "/home/trial3/ORGAN/organ/init.py", line 743, in batch_reward for sample in samples] File "/home/trial3/ORGAN/organ/mol_metrics.py", line 117, in decode return unpad(''.join([ord_dict[o] for o in ords])) KeyError: 'N'
It looks like this is an already decoded smile, which should not be sent back to decode again. Any idea what's going on? Thanks! Toushi68
Hello @couteiral
I ran my code line by line to see where the problem occurs. I see that after importing the dataset with model.load_training_set('data/toy.csv') the error comes up.
Traceback (most recent call last):
File "
Any solutions?
I have the same error with @ahylton19 .
Traceback (most recent call last):
File "example.py", line 5, in
I have the same error with @Kajiyu
Traceback (most recent call last):
File "example.py", line 5, in
Any solutions?
we will be updating the repo soon, we expect these changes to be incorporated by monday, latest tuesday...stay tuned, they will fix these issues.
I have the same error with @xuzhang5788 Seems they are not going to update it?
I checked the table of SMILES and it seems that '.' represents one kind of bond and in the mol_metrics.py file they didn't add it. To fix the error please modify line 315 to
chars = chars + ['-', '=', '#', '.']
I guess it works as long as you add '.' to chars.
When running,I have an error:
Traceback (most recent call last):
File "example.py", line 8, in