nsiqcppstyle Overloaded operator function name should be returned as one FUNCTION token, not two tokens FUNCTION + <operator>

The Problem

When NsiqCppStyle parses the following code snippet:

Event& Event::operator=(OUT Event& other)
{
}

The internal token list (accessed via the lexer.tokenlist member variable) looks like this:

tokenlist = {list: 19}
 00 = {LexToken} LexToken(ID,'Event',1,1,0, False, None)
 01 = {LexToken} LexToken(AND,'&',1,6,5, False, None)
 02 = {LexToken} LexToken(SPACE,' ',1,7,6, False, None)
 03 = {LexToken} LexToken(ID,'Event',1,8,7, False, None)
 04 = {LexToken} LexToken(DOUBLECOLON,'::',1,13,12, False, None)
 05 = {LexToken} LexToken(FUNCTION,'operator',1,15,14, False, None)
 06 = {LexToken} LexToken(EQUALS,'=',1,23,22, False, None)
 07 = {LexToken} LexToken(LPAREN,'(',1,24,23, False, None)
 08 = {LexToken} LexToken(ID,'OUT',1,25,24, False, None)
 09 = {LexToken} LexToken(SPACE,' ',1,28,27, False, None)
 10 = {LexToken} LexToken(ID,'Event',1,29,28, False, None)
 11 = {LexToken} LexToken(AND,'&',1,34,33, False, None)
 12 = {LexToken} LexToken(SPACE,' ',1,35,34, False, None)
 13 = {LexToken} LexToken(ID,'other',1,36,35, False, None)
 14 = {LexToken} LexToken(RPAREN,')',1,41,40, False, None)
 15 = {LexToken} LexToken(LINEFEED,'\n',1,42,41, False, None)
 16 = {LexToken} LexToken(LBRACE,'{',2,1,42, False, None)
 17 = {LexToken} LexToken(LINEFEED,'\n',2,2,43, False, None)
 18 = {LexToken} LexToken(RBRACE,'}',3,1,44, False, None)

The problem is that NsiqCppStyle returns the overloaded operator function name ("operator=") as two tokens, i.e., a FUNCTION ("operator" at offset 5) and an EQUALS ("=" at offset 6). This is incorrect - the overloaded operator function name should be returned as one FUNCTION token with a value of "operator=":

tokenlist = {list: 18}
 00 = {LexToken} LexToken(ID,'Event',1,1,0, False, None)
 01 = {LexToken} LexToken(AND,'&',1,6,5, False, None)
 02 = {LexToken} LexToken(SPACE,' ',1,7,6, False, None)
 03 = {LexToken} LexToken(ID,'Event',1,8,7, False, None)
 04 = {LexToken} LexToken(DOUBLECOLON,'::',1,13,12, False, None)
 05 = {LexToken} LexToken(FUNCTION,'operator=',1,15,14, False, None)
 06 = {LexToken} LexToken(LPAREN,'(',1,24,23, False, None)
 07 = {LexToken} LexToken(ID,'OUT',1,25,24, False, None)
 08 = {LexToken} LexToken(SPACE,' ',1,28,27, False, None)
 09 = {LexToken} LexToken(ID,'Event',1,29,28, False, None)
 10 = {LexToken} LexToken(SPACE,' ',1,34,33, False, None)
 11 = {LexToken} LexToken(AND,'&',1,35,34, False, None)
 12 = {LexToken} LexToken(ID,'other',1,36,35, False, None)
 13 = {LexToken} LexToken(RPAREN,')',1,41,40, False, None)
 14 = {LexToken} LexToken(LINEFEED,'\n',1,42,41, False, None)
 15 = {LexToken} LexToken(LBRACE,'{',2,1,42, False, None)
 16 = {LexToken} LexToken(LINEFEED,'\n',2,2,43, False, None)
 17 = {LexToken} LexToken(RBRACE,'}',3,1,44, False, None)

This is the correct way to parse the overloaded operator function name - tokenlist[5] shows the FUNCTION token with a value of "operator=".

Jun 30 '22 15:06 mosherubin

🤔 This would make some parses a bit more complicated. Is there any situation where such a thing would help?

Current:

FUNCTION + OPERATOR = operator overload
FUNCTION = normal function

New:

FUNCTION = normal function or operator overload
To detect operator overload, remove the operator prefix if it's there

IMO, the current version supports a simpler user implementation. Some example usage supporting your preference would be really nice :)

Jun 30 '22 16:06 kunaltyagi

Funny you should ask ;-) The reason for this change, made in my private cloned repo, was because of a proprietary rule I wrote for my company.

My company has a coding standard where every function begins and ends with a call to the internal logging system, tracing the entry and exit in the function. The string passed to the logging system (a) Has a '+' or '-' to denote entry/exit respectively, and (b) following the +/- character must be the full and precise function name. Since all of our functions have a single point of exit, both logging statements are always logged. Here is what a function might look like:

uint32_t Client::GetGrpcRetryInterval(void)
{
    LOG_TRACE("+ Client::GetGrpcRetryInterval");

    Configuration config = this->_configFile->GetConfig();
   
    LOG_TRACE("- Client::GetGrpcRetryInterval return %d", config.grpc_retry_interval());
    return config.grpc_retry_interval();
}

The purpose of the rule was to enforce the coding standard, checking that each function conforms to the standard. The rule flags different coding errors, among them:

The function does not have both entry and exit logging statements
The function does not have two entry logging statements
The function name in the logging statements matches the function definition's full function name exactly

The rule's logic is simple:

When the function's FUNCTION token is encountered, save its token.fullName.
When a token with (type, value) = ('ID', 'LOG_TRACE') is encountered, check the correctness of the STRING passed to LOG_TRACE with the previously saved full function name.

When I ran the rule on our entire code base, it correctly flagged hundreds of coding errors. However, it incorrectly flagged overloaded operator functions as the LOG_TRACE strings not matching the function definition's full name. Here's an example to illustrate the problem:

Thread& Thread::operator=(Thread &other)
{
    LOG_TRACE("+ Thread::operator=");
    ...
    LOG_TRACE("- Thread::operator=");
    return *this;
}

This produces the following lexer.tokenlist:

tokenlist = {list: 42}
 00 = {LexToken} LexToken(ID,'Thread',1,1,0, False, None)
 01 = {LexToken} LexToken(AND,'&',1,7,6, False, None)
 02 = {LexToken} LexToken(SPACE,' ',1,8,7, False, None)
 03 = {LexToken} LexToken(ID,'Thread',1,9,8, False, None)
 04 = {LexToken} LexToken(DOUBLECOLON,'::',1,15,14, False, None)
 05 = {LexToken} LexToken(FUNCTION,'operator',1,17,16, False, None)
 06 = {LexToken} LexToken(EQUALS,'=',1,25,24, False, None)
 07 = {LexToken} LexToken(LPAREN,'(',1,26,25, False, None)
 08 = {LexToken} LexToken(ID,'Thread',1,27,26, False, None)
 09 = {LexToken} LexToken(SPACE,' ',1,33,32, False, None)
 10 = {LexToken} LexToken(AND,'&',1,34,33, False, None)
 11 = {LexToken} LexToken(ID,'other',1,35,34, False, None)
 12 = {LexToken} LexToken(RPAREN,')',1,40,39, False, None)
 13 = {LexToken} LexToken(LINEFEED,'\n',1,41,40, False, None)
 14 = {LexToken} LexToken(LBRACE,'{',2,1,41, False, None)
 15 = {LexToken} LexToken(LINEFEED,'\n',2,2,42, False, None)
 16 = {LexToken} LexToken(SPACE,'    ',3,1,43, False, None)
 17 = {LexToken} LexToken(ID,'LOG_TRACE',3,5,47, False, None)
 18 = {LexToken} LexToken(LPAREN,'(',3,14,56, False, None)
 19 = {LexToken} LexToken(STRING,'"+ Thread::operator="',3,15,57, False, None)
 20 = {LexToken} LexToken(RPAREN,')',3,36,78, False, None)
 21 = {LexToken} LexToken(SEMI,';',3,37,79, False, None)
 22 = {LexToken} LexToken(LINEFEED,'\n',3,38,80, False, None)
 23 = {LexToken} LexToken(SPACE,'    ',4,1,81, False, None)
 24 = {LexToken} LexToken(ELLIPSIS,'...',4,5,85, False, None)
 25 = {LexToken} LexToken(LINEFEED,'\n',4,8,88, False, None)
 26 = {LexToken} LexToken(SPACE,'    ',5,1,89, False, None)
 27 = {LexToken} LexToken(ID,'LOG_TRACE',5,5,93, False, None)
 28 = {LexToken} LexToken(LPAREN,'(',5,14,102, False, None)
 29 = {LexToken} LexToken(STRING,'"- Thread::operator="',5,15,103, False, None)
 30 = {LexToken} LexToken(RPAREN,')',5,36,124, False, None)
 31 = {LexToken} LexToken(SEMI,';',5,37,125, False, None)
 32 = {LexToken} LexToken(LINEFEED,'\n',5,38,126, False, None)
 33 = {LexToken} LexToken(SPACE,'    ',6,1,127, False, None)
 34 = {LexToken} LexToken(RETURN,'return',6,5,131, False, None)
 35 = {LexToken} LexToken(SPACE,' ',6,11,137, False, None)
 36 = {LexToken} LexToken(TIMES,'*',6,12,138, False, None)
 37 = {LexToken} LexToken(THIS,'this',6,13,139, False, None)
 38 = {LexToken} LexToken(SEMI,';',6,17,143, False, None)
 39 = {LexToken} LexToken(LINEFEED,'\n',6,18,144, False, None)
 40 = {LexToken} LexToken(RBRACE,'}',7,1,145, False, None)
 41 = {LexToken} LexToken(LINEFEED,'\n',7,2,146, False, None)

The problem lies within token[5] above:

lexer.tokenlist[5] = {LexToken} LexToken(FUNCTION,'operator',1,17,16, False, None)
 additional = {str} ''
 column = {int} 17
 context = {Context} <nsiqcppstyle_checker.Context object at 0x05046AB0>
 contextStack = {ContextStack} 
 decl = {bool} False
 filename = {str} 'D:\\Junk\\NsiqCppStyle\\operator-2.cpp'
 fullName = {str} 'operator::'
 inactive = {bool} False
 index = {int} 5
 lexer = {Lexer} <nsiqcppstyle_lexer.Lexer object at 0x05044330>
 lexpos = {int} 16
 line = {str} 'Thread& Thread::operator=(Thread &other)'
 lineno = {int} 1
 pp = {NoneType} None
 type = {str} 'FUNCTION'
 value = {str} 'operator'

Although the value ("operator") is correct, the token.fullName value found is "operator::", which is wrong on several counts:

Since this is the FUNCTION token, I would expect token.fullName to be "Thread::operator=". If it were correctly "Thread::operator=", the STRING checks are straightforward and consistent for overloaded- and non-overloaded functions alike.
The colon (":") following "operator" is incorrect.

IMHO, an overloaded operator function is just like any other function name. We should not consider the name to be the string "operator" followed by an equal ("=") character, but rather a function whose name just happens to be "operator=".

I think the opposing question should be posed: in what case would a rule writer want the FUNCTION token.fullName to only correspond to "operator", leaving the work of concatenating the operator etc. to the rule writer. I had no need to split the overloaded operator function name - I wanted to know the full name. Should the rule writer want to know the precise operator, he can easily take a slice of the function name, or look at the following token type.

Jul 01 '22 14:07 mosherubin