comment_parser
comment_parser copied to clipboard
Python parser can now indentify triple-quoted string comments
Python comment parser can now indentify two kinds of comments using the tokenize module.
-
Single-lined comments which begin with the
#
character and end with a line-break. -
Multi-lined comments or docstrings, which are just triple-quoted strings (start and end with
'''
or"""
), are told apart from regular strings by the type of the previous token which should be a line-break or an indentation (NEWLINE
,NL
,INDENT
orDEDENT
) or no token at all (it would mean it's the fisrt thing in the script). Even in cases like this:my_string = \ '''this should not be considered a comment''' my_string = \ '''this should not either''' # <- notice the increasing indentation my_string = \ '''weird syntax anyway''' # <- but still valid indentation
the previous token to the string is the
=
operator and not a line-break or an indentation. That way, only triple-quoted strings preceded by a line-break, an indentation, or no token, will be considered intended as comments.
This solves issue https://github.com/jeanralphaviles/comment_parser/issues/33
Hi, your solution does not work for triple-quoted comments in the beginning of files (at least not for the ones that are not the first file).
However, it works when you add tokenize.ENCODING
to multicommprevnums
.