pandoc reST class directive behaviour is not in conformance with specification

Running this modified example through pandoc produces aberrant behaviour that is out of conformance with the class directive specification:

.. class:: special

This is a "special" paragraph.

.. class:: exceptional remarkable

An Exceptional Section
======================

This is an ordinary paragraph.

.. class:: multiple

   First paragraph.

   Second paragraph.

.. class:: test
.. container::

   First paragraph
      
   Second paragraph

Expected pseudo-XML based on specification:

<paragraph classes="special">
    This is a "special" paragraph.
<section classes="exceptional remarkable">
    <title>
        An Exceptional Section
    <paragraph>
        This is an ordinary paragraph.
    <paragraph classes="multiple">
        First paragraph.
    <paragraph classes="multiple">
        Second paragraph.
    <container classes="test">
        <paragraph>
            First paragraph.
        <paragraph>
            Second paragraph.

Output from https://pandoc.org/try:

<div class="special">
<p>This is a "special" paragraph.</p>
</div>
<h1 class="exceptional remarkable" id="an-exceptional-section">An
Exceptional Section</h1>
<p>This is an ordinary paragraph.</p>
<div class="multiple">
<p>First paragraph.</p>
<p>Second paragraph.</p>
</div>
<div class="test">
<div class="container">
<p>First paragraph</p>
<p>Second paragraph</p>
</div>
</div>

Except for headings, .. class:: appears to be putting things in <div> elements rather than applying the class to next element. Note that the example is also currently subject to #10004.

Additional examples from the specification, illustrating that the class directive does not always get applied to the next document element as stated in the spec:

.. note:: the class values set in this directive-block do not apply to
   the note but the next paragraph.

   .. class:: special

This is a paragraph with class value "special".

<div class="note">
<div class="title">
<p>Note</p>
</div>
<p>the class values set in this directive-block do not apply to the note
but the next paragraph.</p>
</div>
<p>This is a paragraph with class value "special".</p>

* bullet list

  .. class:: classy item

* second item, with class argument

<ul>
<li>bullet list</li>
<li>second item, with class argument</li>
</ul>

.. class:: highlights
..

    Block quote text.

<div class="highlights">

</div>
<blockquote>
<p>Block quote text.</p>
</blockquote>

All of these outputs were produced by https://pandoc.org/try with pandoc --from rst --to html5 --no-highlight.

Jul 18 '24 17:07 EnronEvolved

I had a poke around in RST.hs, and found that the reason headings behave correctly was because a special case of this fault has already been identified and fixed in #6699.

I don't know Haskell (or the codebase) well enough to knock a pull request together, but I think at least some of the aberrant behaviour can be remedied by widening the check for headings in RST.hs to all block elements. Getting full compliance (i.e. for list items) might need a little extra parser state.

Jul 22 '24 14:07 EnronEvolved

I've looked at the codebase, and I believe I understand why this behaviour, aberrant as it is, occurs: not all block elements have an Attr field in Pandoc's native representation.

I guess changing the way Pandoc represents documents would be a major breaking change, but maybe it's possible to work around it.

I believe that, with the possible exception of list items and block quotes (I still don't fully understand the parser), the proposed pseudocode can fix the issue:

childList = B.toList children
if all acceptAttrs childList
  return pushAttrs attrs childList
else
  return B.divWith (attrs <> "PANDOC-TRY-FLATTEN")

The types are definitely mismatched, but the gist is that (where necessary) the parsing of the class directive should be marked somehow, in a way that either Pandoc's writers or an independent post-processor can recognise. I chose PANDOC-TRY-FLATTEN as the marker attribute because flattening needs to be tried to replicate the expected behaviour.

If all of the children can accept attributes, the expected behaviour can (and IMO should) be replicated exactly.

Jul 23 '24 22:07 EnronEvolved

not all block elements have an Attr field in Pandoc's native representation.

Yes, that's the reason for the first divergence. We try to get as close as we can given that limitation (here, adding an enclosing Div).

Some day we may change the AST so that everything can take an Attr, but it's a big breaking change to the whole ecosystem, so rather painful.

As for

.. note:: the class values set in this directive-block do not apply to
   the note but the next paragraph.

   .. class:: special

This is a paragraph with class value "special".

The issue, I believe, is that pandoc parses everything in the indented block following .. note as a unit, so the .. class:: special isn't applied to the next paragraph. (To me, it's quite counterintuitive that this works in RST; can you point me to the relevant part of the RST spec?) If you unindent the .. class:: special it should work.

Jul 28 '24 16:07 jgm

There's a link to the relevant section of the specification in the first paragraph of my opening comment. All of these test cases have been lifted from there, and usually from the section's footnotes. To quote:

The "class" directive sets the classes attribute value on its content or on the first immediately following [10] non-comment element [11]. The directive argument consists of one or more space-separated class names. The names are transformed to conform to the regular expression [a-z](-?[a-z0-9]+)* (see Identifier Normalization below).

Jul 29 '24 06:07 EnronEvolved

My question wasn't really about what .. class does, but rather about the rules for including something under .. note. Pandoc is subsuming the .. class under the note, because it's indented. That's why it doesn't affect the following paragraph.

Jul 29 '24 17:07 jgm

I'm not entirely sure the spec can be clearer than "on its content or on the first immediately following non-comment element"… where exactly the next non-comment element lies in the syntax tree isn't relevant. Skip over non-comment elements, give the next non-comment element the class.

Like I said, might need some extra parser state. A list of classes to add should do the trick.

It might be worthwhile playing around with docutils. It's the reference implementation and has a CLI that you can use to knock together simple test cases.

Here's another sample with some more esoteric test cases:

Test Please Ignore
==================

Shamabasnsdgsdgdsg
~~~~~~~~~~~~~~~~~~

Part 1: Origins
---------------

.. note::

   dfsfjlkef

   .. class:: test

   .. class:: othertest

.. Test comment please ignore

This is a test paragraph


.. class:: more

.. class:: MORE

.. class:: MOOOOORE!

.. class:: multiple again

   foo bar baz

   spam eggs

.. class:: foo

.. container::

   deeble

Here's what docutils makes of it:

<main id="test-please-ignore">
<h1 class="title">Test Please Ignore</h1>
<p class="subtitle" id="shamabasnsdgsdgdsg">Shamabasnsdgsdgdsg</p>

<section id="part-1-origins">
<h2>Part 1: Origins</h2>
<aside class="admonition note">
<p class="admonition-title">Note</p>
<p>dfsfjlkef</p>
</aside>
<!-- Test comment please ignore -->
<p class="test othertest">This is a test paragraph</p>
<p class="multiple again more mooooore">foo bar baz</p>
<p class="multiple again">spam eggs</p>
<div class="foo docutils container">
<p>deeble</p>
</div>
</section>
</main>

Note how classes are normalised, merging "more" and "MORE". I'm not sure if pandoc wants to do that or not. Also note that, while the children of the relevant class directive get both "multiple" and "again", only the first child got the three previous classes.

Jul 30 '24 08:07 EnronEvolved

first immediately following non-comment element

I would have thought that an element that is outside the scope of the note would not count as "immediately following" one inside the scope. Anyway, this is not clear -- to me at least.

Jul 30 '24 15:07 jgm

But the behavior of the reference implementation does clarify how it's supposed to work.

Jul 30 '24 15:07 jgm

I think that if we want better conformance it might be necessary to rewrite the parser so that it uses a parsing strategy close to what docutils uses.

Jul 30 '24 16:07 jgm