mwparserfromhell
mwparserfromhell copied to clipboard
Templates with unbalanced '<' or '>' inside not parsed correctly
For example, the parser parses {{#expr:8<2}} as a single string. If the < is removed, it parses correctly.
To reproduce this issue:
>>> import mwparserfromhell
>>> mwparserfromhell.parse('{{#expr:8<2}}').nodes[0]
'{{#expr:8<2}}'
>>> type(_)
<class 'mwparserfromhell.nodes.text.Text'>
>>> mwparserfromhell.parse('{{#expr:82}}').nodes[0]
'{{#expr:82}}'
>>> type(_)
<class 'mwparserfromhell.nodes.template.Template'>
>>>
This is also a problem with unbalanced ''' in a parameter value
@RheingoldRiver – see #40.
The original problem in this issue is related to lack of support for parser functions, I think. We treat "#expr:8<2" as a template name and reject it because < cannot be in a page title. We would want to loosen validation on templates that look like parser functions, but I'm not sure how to actually decide this without other information; checking for the name starting with "#" is insufficient because you have things like {{urlencode:foo>bar}}, which is valid, in contrast to {{template:foo>bar}}, which is not. Since namespaces are localizable and parser functions can be installed by extensions, this seems very evil.
Very similar problem is even with wikilinks:
>>> w = mwparserfromhell.parse("[[foo#bar < baz]]")
>>> print(w.get_tree())
[[foo#bar < baz]]
>>> w = mwparserfromhell.parse("[[foo|bar < baz]]")
>>> print(w.get_tree())
[[
foo
| bar < baz
]]
Is it because < is rejected because it cannot be in a page title? #40 does not seem to be the culprit, because the unbalanced < does not cause problems when it is in the text part.
I'm pretty sure it's because < can't be in a page title; mwparser doesn't know that the fragment (the part following #) has different allowed characters...
This is also a problem with unbalanced ''' in a parameter value
I've also run into this issue.