pandoc icon indicating copy to clipboard operation
pandoc copied to clipboard

ConTeXt writer: tag paragraphs

Open tarleb opened this issue 2 years ago • 13 comments

Paragraphs are enclosed by \startparagraph and \stopparagraph commands. This ensures better tagging results in PDF output.

tarleb avatar Feb 02 '22 22:02 tarleb

Demonstrating the difference (extracted from the PDF with pdfinfo -struct-text):

before

Div
  "Term Paper TitleBird2022-01-23"
  Sect "section"
    Div
      H (block)
        "Knuth"
    Div
      "Thus, I came to the conclusion that the designer of a new system must not onlybe the implementer and first large–scale user; the designer should also write thefirst user manual.The separation of any of these four components would have hurt TeX significantly. IfI had not participated fully in all these activities, literally hundreds of improvementswould never have been made, because I would never have thought of them orperceived why they were important.But a system cannot be successful if it is too strongly influenced by a single person.Once the initial design is complete and fairly robust, the real test begins as peoplewith many different viewpoints undertake their own experiments."

after

  "Term Paper TitleBird2022-01-23"
  Sect "section"
    Div
      H (block)
        "Knuth"
    Div
      P (block)
        "Thus, I came to the conclusion that the designer of a new system must not onlybe the implementer and first large–scale user; the designer should also write thefirst user manual."
      P (block)
        "The separation of any of these four components would have hurt TeX significantly. IfI had not participated fully in all these activities, literally hundreds of improvementswould never have been made, because I would never have thought of them orperceived why they were important."
      P (block)
        "But a system cannot be successful if it is too strongly influenced by a single person.Once the initial design is complete and fairly robust, the real test begins as peoplewith many different viewpoints undertake their own experiments."

I'm just not sure if the additional verbosity is worth it.

CC: @denismaier @klpn

tarleb avatar Feb 02 '22 22:02 tarleb

Sounds like a win to me, but let's see what the ConTeXt experts say.

jgm avatar Feb 02 '22 23:02 jgm

In general that's a useful addition, especially when going directly to PDF. Maybe, if you convert to context sources, some might the less verbose alternative. Maybe a new command line option could be useful?

denismaier avatar Feb 03 '22 14:02 denismaier

I suppose we could add an extension like tags or pagaraph_tags. But I'm not sure how important this would be. If someone is using pandoc to generate ConTeXt that will then be hand-edited, and they don't want these things, they could always pipe the output through

sed -E -e '/\\(start|stop)paragraph/d'

Interested in more feedback on this from ConTeXt users...

jgm avatar Feb 03 '22 14:02 jgm

Actually, the ConTeXt writer has access to variables, so why don't we just activate this feature if the pdfa variable is set? Would that be sensitive?

jgm avatar Feb 03 '22 14:02 jgm

Checking the pdfa variable would make sense, IMHO.

It seems that there are number of additional cases where we could improve tagging, e.g., in lists or for emphasized text: the ConTeXt wiki recommends to define \definehighlight[emph][style={\em}] and use \emph{text} instead of the normal {\em text}, as the former produces better tagging. The Export page in the wiki has a couple of additional examples. The end result looks quite different from "normal" ConTeXt, so yet another extension would be justifiable, too.

tarleb avatar Feb 03 '22 16:02 tarleb

Checking the pdfa variable is easy but slightly unprincipled. (Variables are supposed to be for template inclusion, so it's always a bit odd when they affect the body too.) So maybe adding a tagging extension would make sense. Not sure.

jgm avatar Feb 03 '22 21:02 jgm

I'm tempted to leave things as they are, but to use tagging as motivation for the new writer style and make_variant function that you suggested. Tagging-friendly ConTeXt would be a prime usecase for this.

tarleb avatar Feb 03 '22 22:02 tarleb

I just found out about the effect of --section-divs on ConTeXt output (#2609). I think it might make sense to hide the suggested behavior behind that switch.

tarleb avatar Jun 04 '22 13:06 tarleb

I think having a separate tagging extension might be more principled. It wouldn't really be obvious why --section-divs ALSO puts tags around paragraphs.

Or maybe it wouldn't it be that bad just to do the paragraph tagging by default for ConTeXt? It's the way of the future, presumably.

jgm avatar Jun 04 '22 16:06 jgm

I see, that's true. If we merge this, would it make sense to let the --section-divs behavior be the default? It seem like that would be the most consistent.

tarleb avatar Jun 04 '22 18:06 tarleb

If we merge this, would it make sense to let the --section-divs behavior be the default?

Agreed. I guess that would mean that we're only targeting ConTeXt IV, since older versions don't support the \start/stopsection. But at this point that's probably quite sensible. I think I'd be in favor of the simplest solution, and this is probably it.

The question @denismaier raised above is about the increased verbosity. I don't know how much of an issue that is for ConTeXt user.

jgm avatar Jun 05 '22 04:06 jgm

I was informed on the ConTeXt mailing list that using \startparagraph ... \stopparagraph leads to problems in some cases, e.g. in list items. The workaround is to use \bpar ... \epar instead. I don't understand yet whether it's preferable to always use those commands, or to use them only where \startparagraph would lead to unexpected results.

tarleb avatar Jun 05 '22 14:06 tarleb

I went ahead with the additional extension: if tagging is enabled, all paragraphs are wrapped in \bpar/\epar commands. Furthermore, we then generate \definehighlight commands for all used emphasis types and inject them via the emphasis-commands template variable.

Docs haven't been updated yet.

tarleb avatar Jan 14 '23 20:01 tarleb