fragmentation problem
Hi,
I am trying to build my own db according to the instructions, but when I try mols = list(mutate_mol(m, db_name='test.db', max_size=3)) in the tutorial with my own db, the result list of mols is always empty, but when I try with the preproduced db replacements02_sa2.db there is no problem; I used the CHEMBL231.smi file in the example folder, and follow the instructions
fragmentation -i CHEMBL231.smi -o frags.txt -c 32 -v
frag_to_env -i frags.txt -o r3.txt -r 3 -c 32 -v
sort r3.txt | uniq -c > r3_c.txt
env_to_db -i r3_c.txt -o tert.db -r 3 -c -v
got the result
root@872fabd400c3:/home/crem# python test.py
[]
is there something iI missed?
Hi,
the described workflow seems OK with the exception of DB name (tert.db instead of test.db). If everything else is correct the issue may be in the structure of your molecule m - it may happen that it has all contexts of radius 3 which are not available in the test.db (for such a small DB it is possible). Try to generate DB with radius 2 or 1, would it help or not?
Hi,
I tried to generate db using radius 2 or 1, but when they are finished and excute mols = list(mutate_mol(m, db_name='test.db', max_size=3)), there occurs some error
File "test.py", line 13, in <module>
mols = list(mutate_mol(m, db_name='zink.db', max_size=2))
File "/home/crem/crem/crem.py", line 487, in mutate_mol
for frag_sma, core_sma, freq, ids in __gen_replacements(mol1=mol, mol2=None, db_name=db_name, radius=radius,
File "/home/crem/crem/crem.py", line 344, in __gen_replacements
row_ids = __get_replacements_rowids(cur, env, dist, min_atoms, max_atoms, radius, min_freq, **kwargs)
File "/home/crem/crem/crem.py", line 286, in __get_replacements_rowids
db_cur.execute(sql)
sqlite3.OperationalError: no such table: radius3
I don't get it, why it is still searching for the raduis3 table, is there some cache that I missed?
I generate the db as follows:
fragmentation -i CHEMBL231.smi -o frags.txt -c 32 -v
frag_to_env -i frags.txt -o r2.txt -r 2 -c 32 -v
sort r2.txt | uniq -c > r2_c.txt
env_to_db -i r2_c.txt -o test.db -r 2 -c -v
You have to pass a variable radius=2 in mutate_mol function.
Just a tip: you may store tables with different radius in the same DB.
oh, sorry. I tried with radius =2, it is working,maybe with such a small db,radius =3 is too much .Thanks so much for the help!!
sorry to bother again, I have some other problems while using the mutate, the original molecule looks like above, smiles format in 'NC(=N)c4ccc3[nH]c(c2cc(Cl)cc(c1cccc(N)c1)c2O)nc3c4',I set the protected ids to [9, 10, 11, 12, 13, 14, 22, 23],
so, theoretically the marked part will stay the same.
but I got a lot of results like this, but
this part has never been changed. I set the radius to 3 ,using the databank I produced from the zink250(radius also set to 3), did I miss some settings?
- if some part of a molecule is never changed this can be due to absence of fragments with the same context in your DB. Reduce the context radius and run again.
- if some protected atoms are changed, most probably you choose wrong atom ids. Paste the whole code to reproduce the issue, because atom ids in rdkit will depend on how you load a molecule.