confluence-publisher icon indicating copy to clipboard operation
confluence-publisher copied to clipboard

Convert HTML span with class by either <u> ou <s>

Open lgringo opened this issue 3 years ago • 3 comments

Hi,

fix #358

I choose to use a post-processor to convert span to tag :

  • <span class="underline">Underline</span> -> <u>Underline</u>
  • <span class="line-through">Strikethrough</span> -> <s>Strikethrough</s>

This post-processor is named fixHtmlTags. It parses the entire xhtml output using java xml tooling. I tried to tune serializer to be as close as possible to input source, but empy tags are serialized with the short form : <tag/> . So I have to fix all tests.

As a consequence, all pages containing empty tags with separate start and end tags will be re-uploaded again while publishing (without any visual changes).

lgringo avatar Feb 21 '22 17:02 lgringo

Hi @lgringo, thank you for your contribution. The AsciidocConfluencePage already uses post-processing of the converted html in some places, but avoids actually parsing and serializing the html content (for the same reason of getting unwanted modifications due to the parsing/serializing you are mentioning). Therefore, it uses regular expressions instead.

Do you see any chance to re-write your post processor to use regular expression instead of XML parsing?

cstettler avatar Feb 24 '22 19:02 cstettler

Hi @cstettler ,

Thanks for this review.

Finding matching html tags with regex is not easy, but I think I can do it.

You are the author and I will respect your choice, but I really think parsing ans serializing generated xhtml is not a big deal, and it's easier. I could be nice for other contributors too. As said, the serialized xhtml is exactly the same, except for empy tags.

lgringo avatar Feb 25 '22 10:02 lgringo

Hi @lgringo, are you still considering adjusting the approach for the HTML post-processing? Would be great to get this feature into the Confluence Publisher. Thank you very much!

cstettler avatar Sep 27 '22 06:09 cstettler