crem icon indicating copy to clipboard operation
crem copied to clipboard

fragmentation problem

Open suice07 opened this issue 3 years ago • 6 comments

Hi,

I am trying to build my own db according to the instructions, but when I try mols = list(mutate_mol(m, db_name='test.db', max_size=3)) in the tutorial with my own db, the result list of mols is always empty, but when I try with the preproduced db replacements02_sa2.db there is no problem; I used the CHEMBL231.smi file in the example folder, and follow the instructions

fragmentation -i CHEMBL231.smi -o frags.txt -c 32 -v
frag_to_env -i frags.txt -o r3.txt -r 3 -c 32 -v
sort r3.txt | uniq -c > r3_c.txt
env_to_db -i r3_c.txt -o tert.db -r 3 -c -v

got the result

 root@872fabd400c3:/home/crem# python test.py
[]

is there something iI missed?

suice07 avatar Nov 04 '22 02:11 suice07

Hi, the described workflow seems OK with the exception of DB name (tert.db instead of test.db). If everything else is correct the issue may be in the structure of your molecule m - it may happen that it has all contexts of radius 3 which are not available in the test.db (for such a small DB it is possible). Try to generate DB with radius 2 or 1, would it help or not?

DrrDom avatar Nov 04 '22 08:11 DrrDom

Hi,

I tried to generate db using radius 2 or 1, but when they are finished and excute mols = list(mutate_mol(m, db_name='test.db', max_size=3)), there occurs some error

 File "test.py", line 13, in <module>
    mols = list(mutate_mol(m, db_name='zink.db', max_size=2))
  File "/home/crem/crem/crem.py", line 487, in mutate_mol
    for frag_sma, core_sma, freq, ids in __gen_replacements(mol1=mol, mol2=None, db_name=db_name, radius=radius,
  File "/home/crem/crem/crem.py", line 344, in __gen_replacements
    row_ids = __get_replacements_rowids(cur, env, dist, min_atoms, max_atoms, radius, min_freq, **kwargs)
  File "/home/crem/crem/crem.py", line 286, in __get_replacements_rowids
    db_cur.execute(sql)
sqlite3.OperationalError: no such table: radius3

I don't get it, why it is still searching for the raduis3 table, is there some cache that I missed?

I generate the db as follows:

fragmentation -i CHEMBL231.smi -o frags.txt -c 32 -v
frag_to_env -i frags.txt -o r2.txt -r 2 -c 32 -v
sort r2.txt | uniq -c > r2_c.txt
env_to_db -i r2_c.txt -o test.db -r 2 -c -v

suice07 avatar Nov 07 '22 01:11 suice07

You have to pass a variable radius=2 in mutate_mol function. Just a tip: you may store tables with different radius in the same DB.

DrrDom avatar Nov 07 '22 10:11 DrrDom

oh, sorry. I tried with radius =2, it is working,maybe with such a small db,radius =3 is too much .Thanks so much for the help!!

suice07 avatar Nov 09 '22 10:11 suice07

image sorry to bother again, I have some other problems while using the mutate, the original molecule looks like above, smiles format in 'NC(=N)c4ccc3[nH]c(c2cc(Cl)cc(c1cccc(N)c1)c2O)nc3c4',I set the protected ids to [9, 10, 11, 12, 13, 14, 22, 23], image so, theoretically the marked part will stay the same. image but I got a lot of results like this, but image this part has never been changed. I set the radius to 3 ,using the databank I produced from the zink250(radius also set to 3), did I miss some settings?

suice07 avatar Nov 11 '22 02:11 suice07

  • if some part of a molecule is never changed this can be due to absence of fragments with the same context in your DB. Reduce the context radius and run again.
  • if some protected atoms are changed, most probably you choose wrong atom ids. Paste the whole code to reproduce the issue, because atom ids in rdkit will depend on how you load a molecule.

DrrDom avatar Nov 11 '22 04:11 DrrDom