mwparserfromhell icon indicating copy to clipboard operation
mwparserfromhell copied to clipboard

get_tree() could be prettier and more configurable

Open ctrlcctrlv opened this issue 4 years ago • 1 comments

First of all, this is a great parser, thanks for writing it.

This is a follow up to #245, which taught me that mwparserfromhell.wikicode.Wikicode.get_tree is this library's pretty printer.

It doesn't seem configurable though, and some of its output seems to be overly verbose. That is to say, in an attempt to prettify, perhaps we're making it too pretty, and therefore not pretty. 🤗

Consider the following:

hlist="""{{hlist|{{#if:{{{1|}}}|'''[[{{{1}}}]]'''}}|{{#if:{{{2|}}}|'''[[{{{2}}}]]'''}}|{{#if:{{{3|}}}|'''[[{{{3}}}]]'''}}|{{#if:{{{4|}}}|'''[[{{{4}}}]]'''}}|{{#if:{{{5|}}}|'''[[{{{5}}}]]'''}}|{{#if:{{{6|}}}|'''[[{{{6}}}]]'''}}|{{#if:{{{7|}}}|'''[[{{{7}}}]]'''}}|{{#if:{{{8|}}}|'''[[{{{8}}}]]'''}}|{{#if:{{{9|}}}|'''[[{{{9}}}]]'''}}|{{#if:{{{text|}}}|{{{text}}}}}}}"""
T=mwparserfromhell.parse(hlist).get_tree()
print(T)

Output:

      {{
            hlist
          | 1
          = {{
                  #if:
                  {{{
                        1
                      | }}}
                | 1
                = <
                        b
                  >
                        [[
                              {{{
                                    1
                              }}}
                        ]]
                  </
                        b
                  >
            }}
          | 2
          = {{
                  #if:
                  {{{
...

Full output

Of course, I am aware that the best way to write this code is via a Lua module. Just a real-world example I found in enwiki Template:Userbox COI.

For me, "pretty" output would look more like:

{{ hlist
  | {{ #if: {{{0|}}} | <b>[[{{{0}}}]]</b> }}
  | {{ #if: {{{1|}}} | <b>[[{{{1}}}]]</b> }}
  | {{ #if: {{{2|}}} | <b>[[{{{2}}}]]</b> }}
  | {{ #if: {{{3|}}} | <b>[[{{{3}}}]]</b> }}
  | {{ #if: {{{4|}}} | <b>[[{{{4}}}]]</b> }}
  | {{ #if: {{{5|}}} | <b>[[{{{5}}}]]</b> }}
  | {{ #if: {{{6|}}} | <b>[[{{{6}}}]]</b> }}
  | {{ #if: {{{7|}}} | <b>[[{{{7}}}]]</b> }}
  | {{ #if: {{{8|}}} | <b>[[{{{8}}}]]</b> }}
  | {{ #if: {{{9|}}} | <b>[[{{{9}}}]]</b> }}
  | {{ #if: {{{text|}}} | {{text}}} }}
}}

It would be nice to be able to configure the indentation level, and a level of, for lack of a better term, "linebreakyness".

Thoughts?

ctrlcctrlv avatar Jun 28 '20 07:06 ctrlcctrlv

Thoughts?

I agree. I think the best way to reason about this is a concept of the complexity of a (sub)expression. Sufficiently complex nodes are broken up with whitespace, while less complex ones are kept on one line.

earwig avatar Jun 29 '20 03:06 earwig