cdk
cdk copied to clipboard
HOSE codes wrong
I found that the HOSE codes generated are wrong in some cases. I would expect it affects quite a few actually. The problem occurs if there is more than one atom of an element on a sphere, the atoms attached to these atoms are randomly ordered in the next sphere, not (as it should be) per atom in lower sphere. In some cases of course it could be right be accident. A patch with a test is below. I know what's going wrong in the code, if somebody wants to fix it I can give some hints.
Thanks will review tomorrow.
Hi Stefan, just checked and the patch only contains the test. Out of principle I won't add a failing test unless there is a fix to go with it. Based on previous experience writing a failing test and hoping someone else will fix it simply does not work.
What about putting the patch in a branch, awaiting the fix?
I thought it's better to have things documented, but I don't mind.
I always preferred to have tests. Indeed, if only to have documented that we are aware of a yet unfixed problem. John, another option is to have them @Tag("unfixed") (#JUnit5) and just not run as part of the default test suite?
I think this is quite a simple fix, I just don't want to do it as I don't have a use for Hose codes...
The hierarchal sorting problem is identical to CIP stereochemistry, or Gilleain's signatures library.
If that code could be unified, that would be nice. Currently the hose code generator used its own way to do things, I think it could be fixed, I had some idea about it. But if you think it could share code, that would be better.
Stefan, have you tried Gilleain's signatures as replacement of HOSE codes?
Picking this up now, I can get close to the expected HOSE code
Expected :C-4;CCC(CY,CY,/)//
Actual :C-4;CCC(CY,CY,/,,,/)//
Which is much better than the original
Expected :C-4;CCC(CY,CY,/)//
Actual :C-4;CCC(C,Y,C,Y,/,,,/)//
However looking through the code there is A LOT of bit rot here with lots of sorts that don't do anything :/. Maybe they did something once. I'm half tempted to read the original paper and reimplement from scratch - I now see it would be tricky to just get the layer ordering correct.
John
Thanks @stefhk3 finally figured out how I wanted to handle the issue with the existing AI/ML being built on the old code. It wasn't clear how we could update the db keys since the source data has mostly been lost or no long available. There is now the option to make new HOSECodeGenerator(LEGACY_MODE); which does the old (sometimes incorrect) behaviour.
Fixed via #828