grammars-v4 icon indicating copy to clipboard operation
grammars-v4 copied to clipboard

C: fail to parse #define

Open juli1 opened this issue 4 years ago • 3 comments

When trying to parse a C file, it fails when we encounter #define statements. It seems that this is related to the lexer (see discussion here: https://groups.google.com/g/antlr-discussion/c/sIjr9qUx_Rs)

When trying to parse the code, it seems we have the EOF automatically.

grun  C compilationUnit -gui -tree minishell.c
(compilationUnit <EOF>)

Example of code that does not parse below.

#include <stdio.h>
#include <fcntl.h>
#include <signal.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/wait.h>

#define BUFSIZE 1024
#define EXIT_COMMAND "exit"

void handle_entry(char* entry) {
    char* binary;
    struct stat stats;

    binary = entry;

    if(stat(binary, &stats)) {
        printf("error while executing binary\n");
        return;
    }

    if (!stats.st_mode & X_OK) {
        printf("non executable file\n");
        return;
    }

    int child = fork();
    if (child == 0) {
        char* argv[2];
        argv[0] = binary;
        argv[1] = NULL;
        printf("executing %s\n", binary);
        execve(binary, argv, NULL);
    }
    waitpid(child, NULL, 0);
}


int main(int argc, char* argv[]) {

    char buf[BUFSIZE];

    while(1) {
        printf("minishell>");
        if(scanf("%s", buf) > 0) {
            if(strncmp(buf, EXIT_COMMAND, 4) == 0) {
                printf("kthxbye\n");
                exit(EXIT_SUCCESS);
            } else {
                handle_entry(buf);
            }
        } else {
            printf("invalid input\n");
        }
    }
    
    return 0;
}

juli1 avatar Dec 29 '20 23:12 juli1

The Antlr grammars do not support a preprocessor (any preprocessor command) for both C and C++. And it's not really something you can do in the lexer stage. Logically preprocessor commands are more "parse" time; but they have to feed new strings back into the lexer token stream.

I've been working on a "Antlr4::nextToken" overrride to handle preprocessor commands. I've also seen one other person report working on the same thing. My version has reached the point where I can handle C++ STL header files. I cannot though handle yet the C++ Boost library header files. The Boost library to MAJOR extent stresses preprocessor functionality. Just an example are preprocessor macros that invoke another macro, that invoke another macro, etc. etc.

For me this is just a hobby, so I have no idea whether/when I will get something that can handle the Boost library.

mycroft8 avatar Jan 01 '21 03:01 mycroft8

I do not believe the grammar should "support" the preprocessing stages but it should probably not stop when it encounters a pre-processing directive. It does not stop when it encounters an #include and should probably do the same for #define. At least, the grammar should be explicit about handling pre-processor directives:

  • do not parse any pre-processing directive (including #include) and stop when encountering one
  • ignore and/or generate error message when finding a pre-processing directive #define, #include, #ifdef

The current behavior today is a mix of these (ignore for #include and stop for #define). I am not arguing to have support of pre-processor directives but at least have a explicit and consistent way to handle them.

juli1 avatar Jan 01 '21 17:01 juli1

I've been using SourceMonitor on my sources. The developer retired, I'm supporting him with SourceMonitor become OpenSource. The parser of SourceMonitor was created using VisualParse++ which is no longer available. I'm experimenting with ANTL4 and the grammars of C/C++ to see if a switch is possible. The sources I'm feeding into the lexer/parser are full of preprocessor instructions. From my point of view this should be part of the grammar. During my tests I've added simple rules to skip them. But I suppose that for programs like SourceMonitor it's also important to figure out unused defines, maybe also see the complexity of macros or something similar.

ThirtySomething avatar Oct 05 '21 06:10 ThirtySomething