我发现后处理的一个BUG
Thank you for your MolScribe model. It is very powerful and has high accuracy. However, during testing, I discovered a BUG in the post-processing.
In this example, when (CH2)5 replaces the R group, two bonds are detected around the R group [chemistry.py line 434], resulting in line 435
get_smiles_from_symbol(symbol, mol_w, atom, bonds)
return
'(=C([H]))C([H])C([H])C([H])C([H])'
Two single bonds were merged into one double bond, causing the mol conversion to fail.
I'm trying to fix this bug but I don't have a clue yet.
Nice catch! We have tried to implement a postprocessing algorithm to cover common patterns of abbreviations, but I do think it is challenging to cover all cases. If you manage to design a more principled and robust method, I believe it would be a significant contribution.