mwparserfromhell icon indicating copy to clipboard operation
mwparserfromhell copied to clipboard

Templates with unbalanced '<' or '>' inside not parsed correctly

Open lxylxy123456 opened this issue 5 years ago • 6 comments

For example, the parser parses {{#expr:8<2}} as a single string. If the < is removed, it parses correctly. To reproduce this issue: >>> import mwparserfromhell >>> mwparserfromhell.parse('{{#expr:8<2}}').nodes[0] '{{#expr:8<2}}' >>> type(_) <class 'mwparserfromhell.nodes.text.Text'> >>> mwparserfromhell.parse('{{#expr:82}}').nodes[0] '{{#expr:82}}' >>> type(_) <class 'mwparserfromhell.nodes.template.Template'> >>>

lxylxy123456 avatar Jan 02 '19 14:01 lxylxy123456

This is also a problem with unbalanced ''' in a parameter value

RheingoldRiver avatar Feb 12 '19 03:02 RheingoldRiver

@RheingoldRiver – see #40.

earwig avatar Feb 12 '19 04:02 earwig

The original problem in this issue is related to lack of support for parser functions, I think. We treat "#expr:8<2" as a template name and reject it because < cannot be in a page title. We would want to loosen validation on templates that look like parser functions, but I'm not sure how to actually decide this without other information; checking for the name starting with "#" is insufficient because you have things like {{urlencode:foo>bar}}, which is valid, in contrast to {{template:foo>bar}}, which is not. Since namespaces are localizable and parser functions can be installed by extensions, this seems very evil.

earwig avatar Mar 13 '19 04:03 earwig

Very similar problem is even with wikilinks:

>>> w = mwparserfromhell.parse("[[foo#bar < baz]]")
>>> print(w.get_tree())
[[foo#bar < baz]]
>>> w = mwparserfromhell.parse("[[foo|bar < baz]]")
>>> print(w.get_tree())
[[
      foo
    | bar < baz
]]

Is it because < is rejected because it cannot be in a page title? #40 does not seem to be the culprit, because the unbalanced < does not cause problems when it is in the text part.

lahwaacz avatar Jan 05 '20 10:01 lahwaacz

I'm pretty sure it's because < can't be in a page title; mwparser doesn't know that the fragment (the part following #) has different allowed characters...

earwig avatar Jan 06 '20 04:01 earwig

This is also a problem with unbalanced ''' in a parameter value

I've also run into this issue.

kzim44 avatar Jan 02 '21 00:01 kzim44