line_comment_prefix lines get rendered as blank lines
If I set
line_comment_prefix="##"
in my environment, and then include in my template
## start
## empty line follows:
##
## end
Then it renders to 4 blank lines, which is clearly silly. It should not render any of them.
See also #52.
I'd say this could be the desired behavior. HOWEVER, they are also kept when trim_blocks and lstrip_blocks are true, which they probably shouldn't.
Interestingly though, they are correctly deleted when marked as ##- comment, but the line is kept when there already were some statements on the line, so the logic remove-if-only-whitespace-before-else-keep as discussed in that PR is already there.
import jinja2
e = jinja2.Environment(line_comment_prefix="##")
e_trim = jinja2.Environment(
line_comment_prefix="#", lstrip_blocks=True, trim_blocks=True
)
print("== Comment line is correctly kept")
print(e.from_string("markup\n## comment\nmarkup").render())
print("== Comment line is incorrectly kept")
print(e_trim.from_string("markup\n## comment\nmarkup").render())
print("== Comment line is correctly removed")
print(e.from_string("markup\n##- comment\nmarkup").render())
print("== Comment line is correctly kept")
print(e.from_string("markup\nmarkup ##- comment\nmarkup").render())
I looked into how the markup ##- comment vs ##- comment works: in Lexer.tokeniter, in markup\n##- commment, the regex match is markup\n##-. The - is detected, and the markup part is rstripped.
In the markup\nmarkup ##- comment case, the same is performed, but there is, obviously, no newline to rstrip.
This also leads to the following bug:
import jinja2
e = jinja2.Environment(line_comment_prefix="##")
print(e.from_string("markup\n\n\n\n\n\n##- comment\nmarkup").render() == "markup\nmarkup")
# True
I don't think there's a very simple solution to this.
What's the bug in that last example? It seems correct.
I think this might be related to how we handle newlines when tokenizing in general. There are some other issues related to that as well. Basically we probably want to break tokens at newlines instead of trying to keep everything in a single text node, and handle lstrip, trim, - and + consistently as post processing on each line. Or at least that's the general gist of what I remember from last time I looked.
Jinja currently groups lines together and does some other things to group constant nodes together for optimization, but I have a feeling it's not actually optimizing much at the expense of making the regexes and lexer more complicated to reason about.
See https://github.com/pallets/jinja/issues/408#issuecomment-556501327 and https://github.com/pallets/jinja/pull/1109 (closed but relevant) for more newline issues.
Yes, during debugging I noticed that jinja parses across multiple newlines at once. I don't think this is an issue per se, but it just needs to take care when stripping and decide when it should strip newlines and how many.
Why I think the above is a bug: whitespace control docs say that trim_blocks removes one newline after tag, and lstrip_blocks removes whitespace before block only on the given line. Marking statements and comments with - originally sounded to me like a way to activate this for one specific block. That it actually removes all whitespace until/since last non-whitespace character (both directions for block statements) became clear only after I fiddled with it for a while. I made a suite of test cases for line comments (see the commit referencing this issue above this comment) according to how I would expect it to behave (which obviously fails now). I think I know the direction where to fix it without breaking other stuff, so I could try to look into it more in the following days. However, it first needs to be decided how it should actually behave (and I would vouch for clearing it up in docs too)
Furthermore, when I replace the comment with a no-op statement (## a = True), it fails in a different way, i.e. line comments and line statements behave differently.