ejs
ejs copied to clipboard
How should it work? => literal <%% %%> handling / ignored unmatched closing %> tags
This is a request to discuss/define the desired behaviour. I might (depending on the outcome) be willing to work on it.
I don't think the current behaviour is desired.
ejs.render(' %%> _%> -%>')
=> ' %> _%>'
- %%> gives a literal (ok), but also changes the next none literal closing tag _%> into a literal.
- _%> is just a closing tag with on open tag. It gets ignored (rather than giving an error)
(1) seems just wrong to me.
I could see ejs.render('<%% foo %>')
=> '<% foo %>' making sense. (See #235 #185 )
But it does not work, as it stops on any tag:
ejs.render('<%% foo <%= "bar" %> %%>')
=> '<% foo bar %>'
"bar" is evaluated by <%=, that is why the " are gone in the result.
Also what is the wanted behaviour, if they occur inside a tag (e.g inside an eval)?
It may be needed that a piece of javascript contains a <%, but that is currently not possible
ejs.render(' <%= "bar <%%" %> ')
ejs.render(' <%= "bar <%%" %> %>')
both give: Error: Could not find matching close tag for "<%=".
So the question is: Are <%% and %%>
- just single escaped <% and %>
- a pair of tags, that should mark their entire content as literal (and then the closing tag must be %%> because %> would be seen a literal text)
My opinion would be to make them single escapes for <% and %>.
- So they would not affect any other (following) tags.
- They could work inside any tag
That is an incompatible change, but I don't really see any way to make them consistent (in any way) without breaking compatibility.
If a "literal block" is wanted, a different syntax would be needed.
- "<%%%" would not work, as that already means escaped "<%" and a standalone "%"
- So something like <%[ .... ]%> maybe.
But I really don't see the need for that. Either
- escape each individual <%, %>
- change the delimiter.
- write a plugin (once #274 will be finished).
Loosely related: The ignorance of extra closing tags:
ejs.render('foo -%>\nbar')
=> 'foo bar'
IMHO closing tags that have no matching opening tag should NOT be ignored, but give an error. That can however be archived as an option. So the default behaviour can be kept, and they can be ignored.
But if they are ignored, is it really desired that they have the same "slurp" effects as proper tags?
If a matched <% ... -%>
is encountered, it is handled as a closing tag, and it slurps the next new line.
But in the above example
ejs.render('foo -%>\nbar')
=> 'foo bar'
It is not a tag. It is ignored. Should it slurp the new line?
I also have a few pending patches that might depend on this https://github.com/User4martin/ejs/tree/refactor-generate-source-full-tokens
I will discuss it later, and in its own thread. It's main objective is to reduce the amount of calls to scanLine;
- It simplifies scanLine
- actually speeds up compilation
It currently keeps the above behaviour (except, it may change things for <%% inside other tags => but that does not matter, is it currently always gives an error) But keeping the behaviour gives insane regex. (Still it speeds up compiling even with those regex).
This is the discussion I would ideally like to have about the spec for closing tags. @RyanZim do you have thoughts about the desired behavior here?
I'm not as familiar with the parsing logic as I should be, so perhaps some of this is impractical, but here goes:
Literals:
Literal handling is currently pretty crazy. I propose we substitute <%%
with <%
and %%>
with %>
; without regard for closing tags.
IMO, if you're using anything that looks like a literal or a closing tag inside your JS, you're doing something wrong. HTML parsing breaks out of scripts if </script>
is encountered in the middle of a JS string, and people work around it. They can work around the EJS parser too IMO.
I do not see the need for "Literal Blocks" personally. You can change the delimiter if you have too many conflicts with EJS tags. HTML doesn't allow you to configure anything, you've got to do <html%gt;
madness if you're writing a tutorial on HTML, yet people still use it.
Closing tags:
I'm not sure how dangling closing tags should be handled; I'm willing to tolerate a bit of odd behavior for a boost in parsing speed.
just to give some examples
current regexp:
var _REGEX_STRING_OPEN = '([^]*?)<%(?!%)([_=#-])?';
var _REGEX_STRING_CLOSE = '^((?:(?![-_%]?%>|<%(?!%))[^]|%%>)*)([-_]?)%>';
var _REGEX_STRING_LITERAL
= '(?:'
+ '(?:(?:(<%)%)|(?:%(%>)))'
+ '(?:'
+ '((?:(?!<%%|[%_-]?%>)[^])*?)'
+ '(?:(%>)|(?:(-%>)(?:\r\n|\r|\n)?)|(?:(_%>)[ \t]*(?:\r\n|\r|\n)?))'
+ ')?'
+ ')'
+ '|(?:(?:_%>[ \t]*(?:\r\n|\r|\n)?)|(?:-%>(?:\r\n|\r|\n)?)|%>)';
var _REGEX_STRING_LITERAL_REPL = '$1$2$3$4$5$6';
var _REGEX_STRING_TAG_LITERAL = '(<%)%|%(%>)';
var _REGEX_STRING_TAG_LITERAL_REPL = '$1$2';
Making literals simple (no longer look for closing tags) and remove extra closing, but no longer slurp
reduces the big regexp to
_REGEX_STRING_LITERAL
= '(?:(?:(<%)%)|(?:%(%>)))' // $1,$2: match literal
+ '|(?:[_-]?%>)'; // match any close tag that are not paired with a literal
_REGEX_STRING_LITERAL_REPL = '$1$2';
However I could not measure any improve in speed.
Removing literal support from within javascript expressions
I run 2 versions of this
- ignoring <%% %%> inside js (skip, but not replace)
- not ignoring them:
- <%% will be on opening tag, marking the previous as error (missing close)
- %%> will be closing the js early % %>
In both cases the savings where 8% to 10% (eg from 1.9 sec to 1.7 sec)
For the 2nd one Regexp got simpler
_REGEX_STRING_CLOSE = '^((?:(?!<%)[^])*?)([-_]?)%>';
In both cases
_REGEX_STRING_TAG_LITERAL = undefined;
Those savings where also archived if I made applying the regexp conditional on it being defined. In this case the choice can be made via options
allowing user choices
All of those changes can be made by changing regexp, or setting them to undefined (in that case, also adding if statements, where they would be applied.)
since createRegex could be overwritten by plugins, they could substitute regex for other needs. (assuming we add the (if regexp !== undefined)
checks.
This also means that we can ship with 2 sets of regexp. The long ones for current behaviour, and the new ones (whatever they will be). And then allow a choose by option (global, or per template)
extra closing tags. (no opening)
I think that in debugCompile (or make it an extra option opts.strictTags) they should give an error. I have not tested how much that would affect speed.
If speed changes are low enough, then they could always give an error, otherwise silently be dropped,
From a user point of view, I think that an unmatched closing tag is as much an error as an unmatched opening tag (and the latter give an error)
R̗͖̬͔̱̐ͤ̋e͓̘̠g̃͑͌͝ẻ̴͎̫͕͎̳x̖͕͕̓̔ ̹H̫̔ͤ̈́̊ọ̝̭̍̄̔́ͅr̺̥̝̲ͬ̅ͥ̀ṝ̩̫ͤ̿o̠̟͜r͑̂̇́ṡ̠̂!̻̟̘̣̌ͣ̌ Lets keep this as simple as possible, but no simpler.
I am not in favor of allowing user choices. EJS already has almost too many options, we need to keep things simple. Also, by exposing this, we limit ourselves in refactoring the parser, since we need to use the user-supplied regexes without breaking backwards compat.
If we can error on extra closing tags, that's great IMO.
I added the following changes:
-
<%%
and%%>
- are now stand alone.
-
<%% %>
does no longer make a literal out of the%>
- unmatched closing tags generate an error
- therefore there no longer is a question if they should slurp spaces
The commit is in https://github.com/User4martin/ejs/tree/refactor-generate-source-full-tokens (for other changes in this branch see #288 )
Making unmatched closing tags throw on error is only consequent.
It will also help anyone who used the old <%% ... %>
as 2 literals. Anyone who used that will get an error.
I added tests. I amended the docs.
About the docs. I only changed them to describe the new behaviour. I assume that the fact that the behaviour changed goes into the changelog.
But if wanted I can add to the docs (assuming next version will be 3.0:
The behaviour for literals changed in EJS version 3.0. In older releases any literal tag could optionally take a none literal tag as its closing tag. So before v3.0 <%% ... %>
produced two literals.
I wasn't sure if I should update the changelog, or if that is deferred until release.
If this change is ok, I can create a pull request for this.
@mde Do you prefer to merge my work in (small) steps? Or should I wait until all changes are ready and make one big pull request then?
Currently the first to steps from #288 would be ready from my side.
Added a small speed improvement to https://github.com/User4martin/ejs/tree/refactor-generate-source-full-tokens (I can put in on a separate branch / pull-request if desired)
We have a regex scanning for the open (or close) tag. This now also detects the presence of literals (but does nothing with them / it is a search, not a replace). However the info gets used to decide if the regex replace for literals must be run or can be skipped.
The extend of the speed gain, depends on how many literals there a present.
Without the check
node benchmark/bench-ejs.js.bak -r 19 --compile
Running avg accross: 19
name: avg med med/avg min max loops
single tmpl compile: 1.121 1.114 1.114 1.099 1.25 20000
single tmpl compile (debug): 1.36 1.34 1.34 1.3 1.604 20000
large tmpl compile: 3.435 3.37 3.373 3.245 4.077 1000
include-1 compile: 0.924 0.919 0.92 0.891 0.98 20000
With the check
node benchmark/bench-ejs.js.bak -r 19 --compile
Running avg accross: 19
name: avg med med/avg min max loops
single tmpl compile: 1.054 1.04 1.04 1.02 1.205 20000
single tmpl compile (debug): 1.272 1.254 1.254 1.226 1.445 20000
large tmpl compile: 2.777 2.742 2.737 2.596 3.558 1000
include-1 compile: 0.876 0.865 0.866 0.827 1.012 20000
I'm not @mde but I'd prefer if we started a develop
branch that we'd merge breaking changes into. Then we can merge develop
into master
before the release.
Branch is fine with me. We can also leave it in my fork, until ready.
I would like some feedback though on https://github.com/User4martin/ejs/tree/refactor-generate-source-full-tokens (including https://github.com/User4martin/ejs/tree/refactor-generate-source )
Just the concept. For example
- moving old style "include" or changing the regex.
- if the change to <%% will be accepted in the current form.
Small bits like formatting, variable names,... can be done on the final version
I will base all other work on them. So if there is any major stuff that I need to change on them, it would be nice to know.
Apologies for the delay in circling back to this. This is awesome work, and discussion. I wanted to make sure I have a thorough understanding of the various issues and questions before I chime in with feedback.
It seems to me we have two possible approaches:
- Continue this work in a separate branch (named "development" or otherwise), and merge the eventual result as a big-bang major-version update.
- Merge chunks of this work incrementally in as far as they are backward compatible and can be considered minor-version updates.
In my view, the very specific behavior for literal tags discussed here is not documented at all, and use of literals is far enough outside the mainstream usage that these changes could very well be shipped as part of a minor-version update. Backward compatibility under semver IMO should generally relate to documented features, or areas which, even if not specifically documented, are likely to be heavily used.
So this:
I added the following changes:
<%% and %%>
are now stand alone.
<%% %> does no longer make a literal out of the %>
unmatched closing tags generate an error
therefore there no longer is a question if they should slurp spaces
Could actually ship as a minor version bump.
Some other general feedback:
Yes, that new set of regexes is pretty crazy/complicated. At a high level, I think we should support some level of customization, but through specific APIs (e.g., overriding generateSource
). I am not opposed to using such complicated regexes internally, especially as part of an evolving codebase -- but exposing them to outside customization would really bake in that specific implementation, and make it impossible for us to simplify things later.
Any customization API should be designed initially from the point of view of an end-user developer, not just by conveniently making specific implementation details into publicly accessible/overrideable properties.
Thanks for the feedback. I am currently away from home. I will look into details when I am back.
The regex are currently not public. This was merely an idea, but was not pursuit.
Since I added code for adding open tag modifiers (#=-_) there is code that allows handling regex. This can later be extended (for closing tags)
Methods on the template class are from a technical point all accessible to users. But that does not make them part of the supported API. I added a md doc that details any method that is meant to be available for 3rd parties. (though that does not include yet the methods outside the class / like findInclude...)
More later.
use of literals is far enough outside the mainstream usage that these changes could very well be shipped as part of a minor-version update. Backward compatibility under semver IMO should generally relate to documented features, or areas which, even if not specifically documented, are likely to be heavily used.
I have to disagree here. EJS is one of the highly depended-upon packages on npm (currently the 51st most depended-upon npm package). We should make an effort to never ship any kind of possibly-breaking changes except in a major release. That may mean making major releases more often, but why fear the major release? It's just a number, after all.
Any customization API should be designed initially from the point of view of an end-user developer, not just by conveniently making specific implementation details into publicly accessible/overrideable properties.
I'm :100: with you here.