cdk icon indicating copy to clipboard operation
cdk copied to clipboard

HOSE codes wrong

Open stefhk3 opened this issue 6 years ago • 10 comments

I found that the HOSE codes generated are wrong in some cases. I would expect it affects quite a few actually. The problem occurs if there is more than one atom of an element on a sphere, the atoms attached to these atoms are randomly ordered in the next sphere, not (as it should be) per atom in lower sphere. In some cases of course it could be right be accident. A patch with a test is below. I know what's going wrong in the code, if somebody wants to fix it I can give some hints.

patch.txt

stefhk3 avatar Aug 02 '19 19:08 stefhk3

Thanks will review tomorrow.

johnmay avatar Aug 02 '19 20:08 johnmay

Hi Stefan, just checked and the patch only contains the test. Out of principle I won't add a failing test unless there is a fix to go with it. Based on previous experience writing a failing test and hoping someone else will fix it simply does not work.

johnmay avatar Aug 03 '19 17:08 johnmay

What about putting the patch in a branch, awaiting the fix?

egonw avatar Aug 03 '19 18:08 egonw

I thought it's better to have things documented, but I don't mind.

stefhk3 avatar Aug 03 '19 20:08 stefhk3

I always preferred to have tests. Indeed, if only to have documented that we are aware of a yet unfixed problem. John, another option is to have them @Tag("unfixed") (#JUnit5) and just not run as part of the default test suite?

egonw avatar Aug 03 '19 20:08 egonw

I think this is quite a simple fix, I just don't want to do it as I don't have a use for Hose codes...

johnmay avatar Aug 03 '19 20:08 johnmay

The hierarchal sorting problem is identical to CIP stereochemistry, or Gilleain's signatures library.

johnmay avatar Aug 03 '19 20:08 johnmay

If that code could be unified, that would be nice. Currently the hose code generator used its own way to do things, I think it could be fixed, I had some idea about it. But if you think it could share code, that would be better.

stefhk3 avatar Aug 03 '19 21:08 stefhk3

Stefan, have you tried Gilleain's signatures as replacement of HOSE codes?

egonw avatar Aug 04 '19 06:08 egonw

Picking this up now, I can get close to the expected HOSE code

Expected :C-4;CCC(CY,CY,/)//
Actual   :C-4;CCC(CY,CY,/,,,/)//

Which is much better than the original

Expected :C-4;CCC(CY,CY,/)//
Actual   :C-4;CCC(C,Y,C,Y,/,,,/)//

However looking through the code there is A LOT of bit rot here with lots of sorts that don't do anything :/. Maybe they did something once. I'm half tempted to read the original paper and reimplement from scratch - I now see it would be tricky to just get the layer ordering correct.

John

johnmay avatar Feb 07 '22 22:02 johnmay

Thanks @stefhk3 finally figured out how I wanted to handle the issue with the existing AI/ML being built on the old code. It wasn't clear how we could update the db keys since the source data has mostly been lost or no long available. There is now the option to make new HOSECodeGenerator(LEGACY_MODE); which does the old (sometimes incorrect) behaviour.

johnmay avatar Sep 12 '22 12:09 johnmay

Fixed via #828

johnmay avatar Sep 12 '22 12:09 johnmay