Structorizer.Desktop icon indicating copy to clipboard operation
Structorizer.Desktop copied to clipboard

Code import: Are context sensitive tokens possible in GOLDParser (without replacing the words in a preparser)?

Open GitMensch opened this issue 7 years ago • 3 comments

Given the sample

error.syntax in file "C:\dev\gc-contrib\trunk\samples\worldcities\worldcities3fix.cbl"

Preceding source context:
  289:   
  290:   01  cd-date.
  291:       03  cd-year pic xx.
  292:       03  cd-month pic xx.
  293:       03  cd-day-of-month pic xx.
  294:   
  295:   01  start-seconds pic 9(7)v99.
  296:   01  end-seconds pic 9(7)v99.
  297:   01  elapsed-seconds pic 9(5)v99.
  298:   
  299:   01  » tab pic x value x'09'.

Found token TAB (tab)

Expected: COBOLWord | FILLER | ANY | BASED | BINARY | BINARY_CHAR | BINARY_C_LONG | BINARY_DOUBLE | BINARY_LONG
        | BINARY_SHORT | BLANK | COMP | 'COMP_1' | 'COMP_2' | 'COMP_3' | 'COMP_4' | 'COMP_5' | 'COMP_6'
        | COMP_X | DISPLAY | EXTERNAL | 'FLOAT_BINARY_128' | 'FLOAT_BINARY_32' | 'FLOAT_BINARY_64' | 'FLOAT_DECIMAL_16'
        | 'FLOAT_DECIMAL_34' | FLOAT_LONG | FLOAT_SHORT | GLOBAL | INDEX | IS | JUSTIFIED | LEADING | NATIONAL
        | OCCURS | PACKED_DECIMAL | Picture_Def | POINTER | PROGRAM_POINTER | REDEFINES | SIGN | SIGNED_INT
        | SIGNED_LONG | SIGNED_SHORT | SYNCHRONIZED | TOK_DOT | TRAILING | UNSIGNED_INT | UNSIGNED_LONG
        | UNSIGNED_SHORT | USAGE | VALUE

We see an error which only arises as there's a token "TAB" which should be context sensitive (only be active if it is in a different scope).

The only option I currently see is to tweak the pre-parser to replace all words which are context sensitive before the parsing (for example by prefixing it with context_reserved_) and removing this later on.

Much nicer would be if we could instruct the parsing engine to ask us if the current "token" is to be considered as "TOKEN" or as word (this would also allow) [will lead to many calls but seems better as we don't have to tokenize everything on our own].

What would be the reasonable approach for context-sensitive (and/or user-defined word list) tokens?

GitMensch avatar Nov 27 '17 09:11 GitMensch

@GitMensch

What would be the reasonable approach for context-sensitive (and/or user-defined word list) tokens?

The best thing would of course be to fix it in the grammar but that is obviously very complicated if not impossible with COBOL - due to its crude context-dependency. But apart from this I have still no idea what would be the most elegant way to cope. It's likely to require some deeper insight and test approaches.

codemanyak avatar Nov 27 '17 11:11 codemanyak

As we don't have to verify that the program is syntactically correct (we already say somewhere that import programs should be defined that way) there may be an easier approach:

The parsing is currently ended when we have a syntax error. Given the sample above: the parser tells us there's a TAB token which in this case is a user-defined word. Can we tell the parser at this point to remove the TAB token from the stack and push a COBOLWord back? If yes I may could implement something to handle this...

BTW: The question label may be more appropriate than the inconvenient label.

GitMensch avatar Feb 01 '18 11:02 GitMensch

I haven't had enough time yet to investigate this into detail. I'll tell you when I find out a way or that there isn't any. Of course, I hesitate to tweak too much things inside the GOLD parser and would rather keep our interests at the adapter level (or otherwise ask the GOLD implementors).

codemanyak avatar Feb 02 '18 11:02 codemanyak