deepsmiles icon indicating copy to clipboard operation
deepsmiles copied to clipboard

Consider compressing parentheses

Open baoilleach opened this issue 7 years ago • 2 comments

It has been mentioned that replacing the multiple close parentheses by a number plus a single parenthesis would be a good compression strategy. This of course is true. What I don't know is whether it would make it easier for a ML method to use/learn/generate the string. But I guess I can add an option to control this.

In the meanwhile, maybe I can provide a piece of Python code that does the transformation for anyone interested.

baoilleach avatar Sep 28 '18 10:09 baoilleach

Or, going the other way, use "%%%%" instead of "%3". "CCCC%" would be "CC1CC1", "CCCC%%" would be "C1CCC1", etc.

As Noel writes, just need some way to evaluate which is more effective.

adalke avatar Jul 17 '19 09:07 adalke

I agree the multiple consecutive parenthesis is the only weird thing in the proposed syntax. If ML can "understand" ring size, I guess that it could also understand "branch length".

UnixJunkie avatar May 24 '21 08:05 UnixJunkie