mwparserfromhell icon indicating copy to clipboard operation
mwparserfromhell copied to clipboard

Option to remove comments from template values

Open RheingoldRiver opened this issue 4 years ago • 6 comments

I'm not sure the best way this could be implemented, but in the past I've run into issues where a template value has a comment in it, and I want to compare just the value, ignoring the comment. My solution has been to do replacement myself to ignore comments, but it would be really nice if the library handled this.

Maybe an option relating to comments on the initial parse? To strip out all comments / attempt to ignore but leave in place / not treat comments specially - the first use case would be used when there's documenting comments in template preloads that can get deleted once data has been added, the second case would probably be what I'd default to, where it attempts to preserve comments the way it attempts to preserver whitespace; and then the fallback of not treating them specially could be used when there's complicated enough setups that the middle option is unable to work properly.

Would something like this be possible? Thanks!

RheingoldRiver avatar Jan 03 '20 20:01 RheingoldRiver

There is a more general issue with the abstract tree traversal and replacement (with an empty string in your case) - see #195.

lahwaacz avatar Jan 03 '20 21:01 lahwaacz

I don't generally like adding options to the initial parse (there is only one right now, and it's there as a bug workaround), so if the existing behavior is insufficient, my preference would be for an easier way to express this in the wikicode object. Two ideas come to mind:

  • An easier way to remove all comment nodes from the template values after parsing, which might require fixing #195 if there is a performance problem. (How bad is it right now? Surely just a few lines of code?)
  • Some way to compare values that respects semantic equality (as opposed to structural equality). This is more general, and has a few different interpretations, but it could mean ignoring comments, normalizing ''italics'' to <i>italics</i>, normalizing page titles ({{foo}} == {{Template:Foo}})? Thinking about it more, this would be difficult to implement correctly and the exact semantics depend on the use case. If what you need is only to ignore comments, removing them directly seems like the best way?

earwig avatar Jan 07 '20 03:01 earwig

Hmm, yeah I think having matches ignore comments and work on param names/values as well would be sufficient to fix every issue I had - would that work?

RheingoldRiver avatar Jan 07 '20 03:01 RheingoldRiver

Hold on, did you mean Wikicode.matches or am I getting mixed up? That should already ignore comments.

earwig avatar Jan 08 '20 01:01 earwig

Oh, I didn't realize matches already ignores comments. In that case, when I want to ignore comments I'll use matches method. Can you add a note that it ignores comments to the docstring?

RheingoldRiver avatar Jan 08 '20 01:01 RheingoldRiver

Thanks, I'll update it. (I hoped it was made clear by "Specifically, whitespace and markup is stripped" but I can understand how that is ambiguous.)

earwig avatar Jan 08 '20 03:01 earwig