pycparser icon indicating copy to clipboard operation
pycparser copied to clipboard

offsetof parsing fails due to TYPEID as offsetof_member_designator

Open nxmaintainer opened this issue 2 years ago • 1 comments

I'm parsing cpython/Object/exceptions.c with pycparser==2.21 (pypi), preprocessed (exceptions.i) with cpp -nostdinc -E -P -DPy_BUILD_CORE=1 -D_POSIX_THREADS=1 + standard includes and fake_libc_include. Nothing special or tricky.

Fails in this block:

static PyMemberDef UnicodeError_members[] = {
{"encoding", 6, offsetof(PyUnicodeErrorObject, encoding), 0,       // <- parsed correctly
"exception encoding"},
{"object", 6, offsetof(PyUnicodeErrorObject, object), 0,           // <- fails
"exception object"},
{"start", 19, offsetof(PyUnicodeErrorObject, start), 0,
"exception start"},
{"end", 19, offsetof(PyUnicodeErrorObject, end), 0,
"exception end"},
{"reason", 6, offsetof(PyUnicodeErrorObject, reason), 0,
"exception reason"},
{0}
};

and particularly on offsetof(PyUnicodeErrorObject, object) with pycparser.plyparser.ParseError: :9792:46: before: object

Works perfectly fine if I replace object field name with anything else, or replace the offsetof function. There's a difference in parsing the first offsetof in this block (encoding field) and the next one (object field) according to the debug mode.

For `encoding`, take a look at `LexToken(ID,'encoding',9790,365338)` closer to the end:
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open designation_opt brace_open initializer_list COMMA . LexToken(OFFSETOF,'offsetof',9790,365307)
Action : Reduce rule [empty -> <empty>] with [] and goto state 533
Result : <NoneType @ 0x7f13b7ec6280> (None)
State  : 533
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open designation_opt brace_open initializer_list COMMA empty . LexToken(OFFSETOF,'offsetof',9790,365307)
Action : Reduce rule [designation_opt -> empty] with [None] and goto state 532
Result : <NoneType @ 0x7f13b7ec6280> (None)
State  : 532
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open designation_opt brace_open initializer_list COMMA designation_opt . LexToken(OFFSETOF,'offsetof',9790,365307)
Action : Shift and goto state 165
State  : 165
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF . LexToken(LPAREN,'(',9790,365315)
Action : Shift and goto state 304
State  : 304
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN . LexToken(TYPEID,'PyUnicodeErrorObject',9790,365316)
Action : Shift and goto state 35
State  : 35
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN TYPEID . LexToken(COMMA,',',9790,365336)
Action : Reduce rule [typedef_name -> TYPEID] with [<str @ 0x7f13b5a11970>] and goto state 31
Result : <IdentifierType @ 0x7f13b5a14f50> (IdentifierType(names=['PyUnicodeErrorObj ...)
State  : 31
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN typedef_name . LexToken(COMMA,',',9790,365336)
Action : Reduce rule [type_specifier -> typedef_name] with [<IdentifierType @ 0x7f13b5a14f50>] and goto state 212
Result : <IdentifierType @ 0x7f13b5a14f50> (IdentifierType(names=['PyUnicodeErrorObj ...)
State  : 212
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN type_specifier . LexToken(COMMA,',',9790,365336)
Action : Reduce rule [specifier_qualifier_list -> type_specifier] with [<IdentifierType @ 0x7f13b5a14f50>] and goto state 216
Result : <dict @ 0x7f13b5a11700> ({'qual': [], 'storage': [], 'type': [Ide ...)
State  : 216
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN specifier_qualifier_list . LexToken(COMMA,',',9790,365336)
Action : Reduce rule [empty -> <empty>] with [] and goto state 320
Result : <NoneType @ 0x7f13b7ec6280> (None)
State  : 320
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN specifier_qualifier_list empty . LexToken(COMMA,',',9790,365336)
Action : Reduce rule [abstract_declarator_opt -> empty] with [None] and goto state 350
Result : <NoneType @ 0x7f13b7ec6280> (None)
State  : 350
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN specifier_qualifier_list abstract_declarator_opt . LexToken(COMMA,',',9790,365336)
Action : Reduce rule [type_name -> specifier_qualifier_list abstract_declarator_opt] with [<dict @ 0x7f13b5a11700>,None] and goto state 438
Result : <Typename @ 0x7f13b5e9fcb0> (Typename(name=None,quals=[],align=None,t ...)
State  : 438
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN type_name . LexToken(COMMA,',',9790,365336)
Action : Shift and goto state 507
State  : 507
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN type_name COMMA . LexToken(ID,'encoding',9790,365338)
Action : Shift and goto state 159
State  : 159
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN type_name COMMA ID . LexToken(RPAREN,')',9790,365346)
Action : Reduce rule [identifier -> ID] with ['encoding'] and goto state 541
Result : <ID @ 0x7f13b5a14ff0> (ID(name='encoding'))
State  : 541
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN type_name COMMA identifier . LexToken(RPAREN,')',9790,365346)
Action : Reduce rule [offsetof_member_designator -> identifier] with [<ID @ 0x7f13b5a14ff0>] and goto state 540
Result : <ID @ 0x7f13b5a14ff0> (ID(name='encoding'))
State  : 540
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN type_name COMMA offsetof_member_designator . LexToken(RPAREN,')',9790,365346)
Action : Shift and goto state 563
State  : 563
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN type_name COMMA offsetof_member_designator RPAREN . LexToken(COMMA,',',9790,365347)
Action : Reduce rule [primary_expression -> OFFSETOF LPAREN type_name COMMA offsetof_member_designator RPAREN] with ['offsetof','(',<Typename @ 0x7f13b5e9fcb0>,',',<ID @ 0x7f13b5a14ff0>,')'] and goto state 158
Result : <FuncCall @ 0x7f13b5a06ed0> (FuncCall(name=ID(name='offsetof'),args=E ...)
State  : 158
For `object`, take a look at `LexToken(TYPEID,'object',9792,365420)` in the same position where `encoding` has `ID` instead:
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open initializer_list COMMA designation_opt brace_open initializer_list COMMA . LexToken(OFFSETOF,'offsetof',9792,365389)
Action : Reduce rule [empty -> <empty>] with [] and goto state 533
Result : <NoneType @ 0x7f13b7ec6280> (None)
State  : 533
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open initializer_list COMMA designation_opt brace_open initializer_list COMMA empty . LexToken(OFFSETOF,'offsetof',9792,365389)
Action : Reduce rule [designation_opt -> empty] with [None] and goto state 532
Result : <NoneType @ 0x7f13b7ec6280> (None)
State  : 532
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open initializer_list COMMA designation_opt brace_open initializer_list COMMA designation_opt . LexToken(OFFSETOF,'offsetof',9792,365389)
Action : Shift and goto state 165
State  : 165
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open initializer_list COMMA designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF . LexToken(LPAREN,'(',9792,365397)
Action : Shift and goto state 304
State  : 304
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open initializer_list COMMA designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN . LexToken(TYPEID,'PyUnicodeErrorObject',9792,365398)
Action : Shift and goto state 35
State  : 35
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open initializer_list COMMA designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN TYPEID . LexToken(COMMA,',',9792,365418)
Action : Reduce rule [typedef_name -> TYPEID] with [<str @ 0x7f13b5a116f0>] and goto state 31
Result : <IdentifierType @ 0x7f13b5a153b0> (IdentifierType(names=['PyUnicodeErrorObj ...)
State  : 31
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open initializer_list COMMA designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN typedef_name . LexToken(COMMA,',',9792,365418)
Action : Reduce rule [type_specifier -> typedef_name] with [<IdentifierType @ 0x7f13b5a153b0>] and goto state 212
Result : <IdentifierType @ 0x7f13b5a153b0> (IdentifierType(names=['PyUnicodeErrorObj ...)
State  : 212
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open initializer_list COMMA designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN type_specifier . LexToken(COMMA,',',9792,365418)
Action : Reduce rule [specifier_qualifier_list -> type_specifier] with [<IdentifierType @ 0x7f13b5a153b0>] and goto state 216
Result : <dict @ 0x7f13b5a11280> ({'qual': [], 'storage': [], 'type': [Ide ...)
State  : 216
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open initializer_list COMMA designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN specifier_qualifier_list . LexToken(COMMA,',',9792,365418)
Action : Reduce rule [empty -> <empty>] with [] and goto state 320
Result : <NoneType @ 0x7f13b7ec6280> (None)
State  : 320
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open initializer_list COMMA designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN specifier_qualifier_list empty . LexToken(COMMA,',',9792,365418)
Action : Reduce rule [abstract_declarator_opt -> empty] with [None] and goto state 350
Result : <NoneType @ 0x7f13b7ec6280> (None)
State  : 350
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open initializer_list COMMA designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN specifier_qualifier_list abstract_declarator_opt . LexToken(COMMA,',',9792,365418)
Action : Reduce rule [type_name -> specifier_qualifier_list abstract_declarator_opt] with [<dict @ 0x7f13b5a11280>,None] and goto state 438
Result : <Typename @ 0x7f13b5e9fa10> (Typename(name=None,quals=[],align=None,t ...)
State  : 438
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open initializer_list COMMA designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN type_name . LexToken(COMMA,',',9792,365418)
Action : Shift and goto state 507
State  : 507
Stack  : translation_unit declaration_specifiers declarator EQUALS brace_open initializer_list COMMA designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN type_name COMMA . LexToken(TYPEID,'object',9792,365420)
ERROR: Error  : translation_unit declaration_specifiers declarator EQUALS brace_open initializer_list COMMA designation_opt brace_open initializer_list COMMA designation_opt OFFSETOF LPAREN type_name COMMA . 

I approximately understand the issue, object is being interpreted as TYPEID for some reason (I've checked, and didn't find object type being defined/declared in the preprocessed file), so it doesn't fit offsetof_member_designator rule (which requires identifier, which is ID) and fails the primary OFFSETOF expression. I even have a dirty fix, like this:

    def p_offsetof_identifier(self, p):
        """ offsetof_identifier  : ID
                                   | TYPEID
        """
        p[0] = c_ast.ID(p[1], self._token_coord(p, 1))

    def p_offsetof_member_designator(self, p):
        """ offsetof_member_designator : offsetof_identifier
                                         | offsetof_member_designator PERIOD offsetof_identifier
                                         | offsetof_member_designator LBRACKET expression RBRACKET
        """
        if len(p) == 2:
            p[0] = p[1]
        ...

But I don't think this is a correct approach, and looks like the issue is deeper (object initially shouldn't be TYPEID in this context, no?). @eliben / @Ksero I'd really appreciate if you can point me to a better solution, I'd be happy to contribute.

P.S. Please, use exceptions.i for tests, I've tried to make smaller reproducible sample, it just works fine.

nxmaintainer avatar Apr 30 '23 13:04 nxmaintainer

Thanks for the detailed report.

To help further narrow down the issue you can insert a printout (or a stack trace) where CParser adds object to the type map (from which point on it considers it a TYPEID) - this can tell us why it thinks it's a pre-declared type.

eliben avatar May 05 '23 13:05 eliben