sphinx Add reStructuredText parsing functions to ``SphinxDirective``

As was colourfully espoused in #8039, it is harder than it ought to be to parse reStructuredText into nodes in a sphinx directive.

This PR adds three new functions:

SphinxDirective.parse_content_to_nodes(), to parse the entirety of SphinxDirective.content
SphinxDirective.parse_text_to_nodes(), to parse a given string
SphinxDirective.parse_inline(), to parse a text that is inline-only

Yet to finish is documentation and tests, but I am opening now for early feedback.

A

Jun 28 '24 23:06 AA-Turner

cheers @AA-Turner love it ❤️

Haven't looked through in detail yet, but one request... can we link this to https://github.com/sphinx-doc/sphinx/pull/12361, and have the new SphinxDirective methods "ask" the parser how it wants to do the parsing.

This would allow for e.g. MyST parser, or a different implementation of the rST parser to implement their own parsing logic

Jun 29 '24 00:06 chrisjsewell

Oh and also, I don't know if if it was in your planning, but we could also add an inline parse method for SphinxRole

Jun 29 '24 00:06 chrisjsewell

inline parse method for SphinxRole

This isn't easily feasible, having had a quick go. It would be nice, but it in effect represents nested inline parsing.

can we link this to #12361, and have the new SphinxDirective methods "ask" the parser how it wants to do the parsing.

I think this is automatic, as we use the current parser (by using the state machine directly). Very happy for someone to take this on in a follow-up though, if the current implementation is found wanting.

A

Jul 02 '24 20:07 AA-Turner

I think this is automatic, as we use the current parser (by using the state machine directly)

This assumes that the parser has a state machine, which is actually just an implementation detail of docutils, especially things like memo in _fresh_title_style_context

myst-parser has to go through absolutely hurdles to "pretend" it has one: https://github.com/executablebooks/MyST-Parser/blob/master/myst_parser/mocking.py

Very happy for someone to take this on in a follow-up though, if the current implementation is found wanting.

but yeh no problem I can do this

Jul 02 '24 21:07 chrisjsewell

@AA-Turner correct me if I'm wrong, but in this PR you have essentially changed all occurrences of self.state.nested_parse to self.parse_content_to_nodes?

But nested_parse defaulted to match_titles=False, whereas now parse_content_to_nodes uses match_titles=True. Is this intentional? because it is a pretty big change 😬

Jul 02 '24 21:07 chrisjsewell

@AA-Turner correct me if I'm wrong, but in this PR you have essentially changed all occurrences of self.state.nested_parse to self.parse_content_to_nodes?

But nested_parse defaulted to match_titles=False, whereas now parse_content_to_nodes uses match_titles=True. Is this intentional? because it is a pretty big change 😬

See #12503, which restores the status quo ante. I do think in general that titles ought be allowed where possible, and that previously it may have more been a case of forgetting to allow them, but you make a good point that such a change should be made more deliberately.

A

Jul 02 '24 22:07 AA-Turner

I do think in general that titles ought be allowed where possible

where possible maybe, but I do want to emphasise that you cannot nest sections inside other nodes (like admonitions), without breaking the structure of the doctree: https://gist.github.com/chrisjsewell/0c5827add50074fef0937e2543e955b4

this is also the case in other text formats like: https://github.com/jgm/djot/issues/213

what you could allow is headings that are not sections; and in-fact in myst-parser, rather than just omit them, they are actually changed to rubrics, i.e. not structural headings

Jul 02 '24 22:07 chrisjsewell

In my opinion, nested_parse_with_tiles should probably be a private function in sphinx, as it is not really intended to be used by extensions or users, unless they really know what they are doing.

Given this, do you think that the new nested_parse_to_nodes ought also be private? I'm indifferent, but having a better API for parsing arbitrary content to nodes has long been a request (see #8039 and numerous others).

A

Jul 02 '24 22:07 AA-Turner

Given this, do you think that the new nested_parse_to_nodes ought also be private? parsing arbitrary content to nodes has long been a request

In general, when the content contains no section headings, I think its absolutely fine, and this PR makes it that bit easier 👍

Its just when it comes to trying to nested parse section headings, thats when we really need to make sure people know what they are doing, because it can get quite nuanced; since sections are tightly-coupled to the structure of the document

Jul 02 '24 22:07 chrisjsewell

sphinx sphinx copied to clipboard

Add reStructuredText parsing functions to ``SphinxDirective``

sphinx
sphinx copied to clipboard