MyST-Parser
MyST-Parser copied to clipboard
How to respect hidden paragraphs?
Describe the bug
context
I’m migrating a Sphinx project from recommonmark to MyST, and noticed a few unexpected changes in formatting. One is the addition of <p> paragraph tags in list items. Take the following Markdown:
* A Python library
* A Sphinx extension
MyST converts this to:
<ul class="simple">
<li><p>A Python library</p></li>
<li><p>A Sphinx extension</p></li>
</ul>
This example comes from the getting started guide.
expectation
I would have expected:
<ul class="simple">
<li>A Python library</li>
<li>A Sphinx extension</li>
</ul>
This is the output of markdown-it-py, and the markdown-it live demo (not counting the simple class name).
bug
Instead, the <p> tags are added, which means extra vertical spacing and odd content semantics.
problem
I can’t think of a scenario where the extra vertical space, or the semantics, are desirable.
For example, for the MyST documentation, I can see this was worked around of with CSS: https://github.com/executablebooks/sphinx-book-theme/blob/1b5c3889bce036c4116a7e35aed83668ff810357/src/sphinx_book_theme/assets/styles/base/_typography.scss#L46-L52
Reproduce the bug
- Create a Markdown document with an unordered or ordered list item
- Convert to HTML
List your environment
myst-parser==0.17.0
Thanks for opening your first issue here! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out EBP's Code of Conduct. Also, please try to follow the issue template as it helps other community members to contribute more effectively.
If your issue is a feature request, others may react to it, to raise its prominence (see Feature Voting).
Welcome to the EBP community! :tada:
Heya, so the potential problem here is in the "balance" between CommonMark compliance and docutils/sphinx compliance:
Currently, for test.txt
- a
- b
If you run rst2pseudoxml.py test.rst or myst-docutils-pseudoxml test.rst, you will end up with:
<document source="test.rst">
<bullet_list bullet="-">
<list_item>
<paragraph>
a
<list_item>
<paragraph>
b
To get what you want, i.e. CommonMark compliance, one could try simply propagating the hidden metadata, e.g.
<document source="test.rst">
<bullet_list bullet="-">
<list_item>
<paragraph hidden=true>
a
<list_item>
<paragraph hidden=true>
b
However, then you somehow need to get the HTMLTranslator to "respect" this, e.g. here: https://github.com/live-clones/docutils/blob/843f8341e01a8c68cbc7881164899c24af74f291/docutils/docutils/writers/html4css1/init.py#L743-L748
I'm not sure how this is possible?
Alternatively, you could "skip" hidden paragraphs, to generate:
<document source="test.rst">
<bullet_list bullet="-">
<list_item>
a
<list_item>
b
But then, this could potentially break the rest of the docutils/sphinx build chain, that is expecting paragraph nodes 😬
Note by "hidden", I am referring to the debug tab in markdown-it, e.g.
{
"type": "paragraph_open",
"tag": "p",
"attrs": null,
"map": [
0,
1
],
"nesting": 1,
"level": 2,
"children": null,
"content": "",
"markup": "",
"info": "",
"meta": null,
"block": true,
"hidden": true
}
A possible thing, for here and #534, is to indeed add hidden=True to the paragraph,
then have a configuration option for removing these nodes, BUT with no guarantee that this will not break docutils/sphinx
Another stopgap alternative is to change the CSS styling so that visually the outcome is as expected. Semantically it is a bit different (is it data or text). I believe the spread flag in mdast handles this for listItems.
li p, table p {
margin: 0px;
}
Thank you both for looking into this! I think I understand where this is coming from at a high level thanks to your explanation @chrisjsewell, but the nuances of the approaches are lost on me.
Researching this further, I turned to recommonmark’s implementation, which is what we were using before, with the expected output. It has a test case specifically for lists, with the whole Sphinx + docutils + commonmark bridging tested end-to-end: https://github.com/readthedocs/recommonmark/blob/0df398dbca128ce9d6f48f960bb40b9242642a5f/tests/test_sphinx.py#L147-L164
This seemed very promising… but upon cloning the project to run the test locally, I get a test failure, with the same output as MyST:
<ul class="simple">\n<li><p>Item A</p></li>\n<li><p>Item B</p></li>\n<li><p>Item C</p></li>\n</ul>
I managed to get the desired output with:
- sphinx==1.8.6 and any recent version of docutils
- docutils==0.12 and any version of Sphinx
So should I head off to report this elsewhere? It’s unclear to me why the output has changed, but I presume knowing what changed will be a good indication as to how to fix this.
Looks like this got reported for sphinx 2.0b1, which they decided to fix by adding CSS in one theme 😐.
Testing this with rst2pseudoxml and rst2html – I get the same output as you for the AST, but can confirm the HTML does not have the paragraph tags.
So this increasingly looks like a clear sphinx-specific issue? It’s unclear to me why this came up specifically as part of our switch from recommonmark to MyST, but I assume it’s a dependencies management problem on our side.
Edit: yes, since MyST-Parser has an explicit dependency on sphinx>=3.1,<5, this overwrites the sphinx==1.8.6 we’ve explicitly installed previously, and leads to this.