comment_parser icon indicating copy to clipboard operation
comment_parser copied to clipboard

Python parser can now indentify triple-quoted string comments

Open itiel opened this issue 3 years ago • 1 comments

Python comment parser can now indentify two kinds of comments using the tokenize module.

  • Single-lined comments which begin with the # character and end with a line-break.

  • Multi-lined comments or docstrings, which are just triple-quoted strings (start and end with ''' or """), are told apart from regular strings by the type of the previous token which should be a line-break or an indentation (NEWLINE, NL, INDENT or DEDENT) or no token at all (it would mean it's the fisrt thing in the script). Even in cases like this:

    my_string = \
    '''this should not be considered a comment'''
    my_string = \
      '''this should not either''' # <- notice the increasing indentation
    my_string = \
        '''weird syntax anyway''' # <- but still valid indentation
    

    the previous token to the string is the = operator and not a line-break or an indentation. That way, only triple-quoted strings preceded by a line-break, an indentation, or no token, will be considered intended as comments.

This solves issue https://github.com/jeanralphaviles/comment_parser/issues/33

itiel avatar Apr 12 '21 19:04 itiel

Hi, your solution does not work for triple-quoted comments in the beginning of files (at least not for the ones that are not the first file). However, it works when you add tokenize.ENCODING to multicommprevnums.

tim-puhlfuerss avatar Jun 20 '21 20:06 tim-puhlfuerss