antlr4
antlr4 copied to clipboard
Generated C++ code is ill-formed in C++ 20 standard.
A very simple hello grammar file:
grammar Hello ;
firstRule : 'hello' ID ;
ID: [a-z]+ ;
WS : [ \t\r\n]+ -> skip ;
Generated HelloLexer.cpp contains following code snippet:
std::vector<std::string> HelloLexer::_ruleNames = {
u8"T__0", u8"ID", u8"WS"
};
std::vector<std::string> HelloLexer::_channelNames = {
"DEFAULT_TOKEN_CHANNEL", "HIDDEN"
};
std::vector<std::string> HelloLexer::_modeNames = {
u8"DEFAULT_MODE"
};
std::vector<std::string> HelloLexer::_literalNames = {
"", u8"'hello'"
};
std::vector<std::string> HelloLexer::_symbolicNames = {
"", "", u8"ID", u8"WS"
};
These code will raise compile error when using --std=c++2a. In current latest C++20 proposal, using u8"xxx" to construct std::string is forbidened.
This question in StackOverflow discuss about this issue: C++20 with u8, char8_t and std::string
The proposal P1423R2 gives more details.
This proposal also gives us some ways to deal with this case, and I simply add -fno-char8_t in my g++ flag to get rid of this problem, which is just a short-term solution.
I think we should modify code generation template & C++ runtime to support this new change in C++20.
@mike-lischke
OK, if someone could file a PR here we can take a look. Note: I cannot myself test C++20 code currently, so we also need associated test settings for that.
Is this issue fixed by commit 09eb905332c3abe?
I confirm, it is impossible to compile the generated code.
First issue with u8 strings, they can't be casted to std::string because char and u8 strings are different types in C++20.
Another problem is antlr4::atn::SerializedATNView class, it accepts only int32_t vector or int32_t* array, so it means it doesn't accept the type of _serializedATN variable (its std::vector<uint16_t>).
Probably it is better to support C++20 due to its break changes with other standards or just support both standards (C++17/C++20).
My test case was generated parser for CSS3, took grammar from https://github.com/antlr/grammars-v4
As of now, nearly 3 years after the problem was discovered, it's still impossible to compile the generated code.