comment_parser Python parser can now indentify triple-quoted string comments

Python parser can now indentify triple-quoted string comments

Open itiel opened this issue 3 years ago • 1 comments

Python comment parser can now indentify two kinds of comments using the tokenize module.

Single-lined comments which begin with the # character and end with a line-break.
Multi-lined comments or docstrings, which are just triple-quoted strings (start and end with ''' or """), are told apart from regular strings by the type of the previous token which should be a line-break or an indentation (NEWLINE, NL, INDENT or DEDENT) or no token at all (it would mean it's the fisrt thing in the script). Even in cases like this:
```
my_string = \
'''this should not be considered a comment'''
my_string = \
  '''this should not either''' # <- notice the increasing indentation
my_string = \
    '''weird syntax anyway''' # <- but still valid indentation
```
the previous token to the string is the = operator and not a line-break or an indentation. That way, only triple-quoted strings preceded by a line-break, an indentation, or no token, will be considered intended as comments.

This solves issue https://github.com/jeanralphaviles/comment_parser/issues/33

Apr 12 '21 19:04 itiel

Hi, your solution does not work for triple-quoted comments in the beginning of files (at least not for the ones that are not the first file). However, it works when you add tokenize.ENCODING to multicommprevnums.

Jun 20 '21 20:06 tim-puhlfuerss

comment_parser comment_parser copied to clipboard

Python parser can now indentify triple-quoted string comments

comment_parser
comment_parser copied to clipboard