grammars-v4
grammars-v4 copied to clipboard
C: fail to parse #define
When trying to parse a C file, it fails when we encounter #define
statements. It seems that this is related to the lexer (see discussion here: https://groups.google.com/g/antlr-discussion/c/sIjr9qUx_Rs)
When trying to parse the code, it seems we have the EOF
automatically.
grun C compilationUnit -gui -tree minishell.c
(compilationUnit <EOF>)
Example of code that does not parse below.
#include <stdio.h>
#include <fcntl.h>
#include <signal.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/wait.h>
#define BUFSIZE 1024
#define EXIT_COMMAND "exit"
void handle_entry(char* entry) {
char* binary;
struct stat stats;
binary = entry;
if(stat(binary, &stats)) {
printf("error while executing binary\n");
return;
}
if (!stats.st_mode & X_OK) {
printf("non executable file\n");
return;
}
int child = fork();
if (child == 0) {
char* argv[2];
argv[0] = binary;
argv[1] = NULL;
printf("executing %s\n", binary);
execve(binary, argv, NULL);
}
waitpid(child, NULL, 0);
}
int main(int argc, char* argv[]) {
char buf[BUFSIZE];
while(1) {
printf("minishell>");
if(scanf("%s", buf) > 0) {
if(strncmp(buf, EXIT_COMMAND, 4) == 0) {
printf("kthxbye\n");
exit(EXIT_SUCCESS);
} else {
handle_entry(buf);
}
} else {
printf("invalid input\n");
}
}
return 0;
}
The Antlr grammars do not support a preprocessor (any preprocessor command) for both C and C++. And it's not really something you can do in the lexer stage. Logically preprocessor commands are more "parse" time; but they have to feed new strings back into the lexer token stream.
I've been working on a "Antlr4::nextToken" overrride to handle preprocessor commands. I've also seen one other person report working on the same thing. My version has reached the point where I can handle C++ STL header files. I cannot though handle yet the C++ Boost library header files. The Boost library to MAJOR extent stresses preprocessor functionality. Just an example are preprocessor macros that invoke another macro, that invoke another macro, etc. etc.
For me this is just a hobby, so I have no idea whether/when I will get something that can handle the Boost library.
I do not believe the grammar should "support" the preprocessing stages but it should probably not stop when it encounters a pre-processing directive. It does not stop when it encounters an #include
and should probably do the same for #define
. At least, the grammar should be explicit about handling pre-processor directives:
- do not parse any pre-processing directive (including
#include
) and stop when encountering one - ignore and/or generate error message when finding a pre-processing directive
#define
,#include
,#ifdef
The current behavior today is a mix of these (ignore for #include
and stop for #define
). I am not arguing to have support of pre-processor directives but at least have a explicit and consistent way to handle them.
I've been using SourceMonitor on my sources. The developer retired, I'm supporting him with SourceMonitor become OpenSource. The parser of SourceMonitor was created using VisualParse++ which is no longer available. I'm experimenting with ANTL4 and the grammars of C/C++ to see if a switch is possible. The sources I'm feeding into the lexer/parser are full of preprocessor instructions. From my point of view this should be part of the grammar. During my tests I've added simple rules to skip them. But I suppose that for programs like SourceMonitor it's also important to figure out unused defines, maybe also see the complexity of macros or something similar.