opsin
opsin copied to clipboard
"pentdec should not have been lexed as two tokens" error
I'm trying to understand the origins/causes of OPSIN's "should not have been lexed as two tokens" parse failures. Trying to reduce a testcase I see that: 1-(1-((2-(2,6-DIOXOPIPERIDIN-3-YL)-1,3-DIOXOISOINDOLIN-4-YL)AMINO)-3,6,9,12-TETRAOXA-PENTADECAN-15-OYL)benzene parses fine, but a single (prime) character change 1'-(1-((2-(2,6-DIOXOPIPERIDIN-3-YL)-1,3-DIOXOISOINDOLIN-4-YL)AMINO)-3,6,9,12-TETRAOXA-PENTADECAN-15-OYL)benzene results in a confusing internal (parser/lexer backtracking) error. Using biphenyl instead of benzene avoids the error. Any idea what's going wrong. The full original name (that I'm trying to interpret or correct) is: 5-((6-(1'-(1-((2-(2,6-DIOXOPIPERIDIN-3-YL)-1,3-DIOXOISOINDOLIN-4-YL)AMINO)-3,6,9,12-TETRAOXAPENTADECAN-15-OYL)-2-OXO-1-((1S,3S)-3-(PIPERIDIN-1-YL)CYCLOBUTYL)SPIRO[INDOLINE-3,4'-PIPERIDIN]-6-YL)-3-ISOPROPYL-3H-IMIDAZO[4,5-C]PYRIDIN-4-YL)AMINO)-4-FLUORO-N-ISOPROPYL-2-METHYLBENZAMIDE Thanks
OPSIN produces 2 parse trees but the ordering of the two parse trees ends up being arbritrary as both tokenize the input as [pent][a][dec], with the only difference being whether [pent] is an alkaneStemComponent or multiplier. It happens to try the parse tree where pent is a multiplier first, which gives the error "pentdec should not have been lexed as two tokens!". It then tries the correct parse tree. This gives the error "Could not find atom that: <stereoChemistry locant="1" type="RorS" value="S" stereoGroup="Abs">1S</stereoChemistry> appeared to be referring to" These errors can be seen if logging is set to debug (or you run with the command-line version in verbose mode)
Typically the later parse trees are incorrect, hence why only the error associated with the first parse tree is reported.
OPSIN still doesn't support the stereochemistry of that 1,3-substituted cyclobutane. You can get some output by configuring the NameToStructureConfig to warn using: n2sconfig.setWarnRatherThanFailOnUninterpretableStereochemistry(true)
which gives: O=C1NC(CCC1N1C(C2=CC=CC(=C2C1=O)NCCOCCOCCOCCOCCC(=O)N1CCC2(CC1)C(N(C1=CC(=CC=C12)C1=CC2=C(C(=N1)NC=1C(=CC(=C(C(=O)NC(C)C)C1)C)F)N(C=N2)C(C)C)C2CC(C2)N2CCCCC2)=O)=O)=O