strictdoc Unexpected restriction on specific RST directives / compatibility with Breathe Sphinx Plugin

Use case: include doxygen directive through breathe sphinx extension strictoc export or strictdoc server

error: problems when converting RST to HTML: :1: (ERROR/3) Unknown directive type "autodoxygenfile".

.. autodoxygenfile:: test.c :project: TEST

OK with --formats rst

I understand that the export html rendering cannot handle this (could ignore & warn ), but the UI should be able to edit those directive.

Apr 06 '23 07:04 elfman2

Hi @elfman2, thanks for reporting this. You are approaching StrictDoc with a use case that I have not thought about before. And I have just learned about Breathe, which certainly extends my horizon.

TL;DR to my answer below: Instead of ignoring the unsupported directives of Breathe, a better approach is to find a way to register them within StrictDoc's runtime.

I have checked how StrictDoc publishes the RST to HTML, and I am attaching the code at the bottom of this comment. The actual workhorse method is docutils's publish_parts. As it turns out, that method does not allow customization of what is seen as an error or warning, so I would say, controlling the interface of that function is almost impossible.

However, what is possible is that you can register directives and roles on the docutils level without using Sphinx at all:

https://docutils.sourceforge.io/docs/howto/rst-directives.html#register-the-directive
https://docutils.sourceforge.io/docs/howto/rst-roles.html#register-the-role

Having this in mind, I can think of making an experiment and trying to register Breathe to be recognized by StrictDoc natively. The only precondition for this to work is that Breathe's directives should not be hard-coded to anything in Sphinx, but only to Docutils. If this precondition was satisfied, you could have Breathe bridging Doxygen and Docutils/StrictDoc, not only Doxygen and Sphinx. If Breathe's directives are somehow dependent on Sphinx, one can investigate how much coupling is there and maybe suggest a patch or at least a discussion for Breathe to consider supporting itself on top of just Docutils, without Sphinx involved. I am kindly tagging here @michaeljones who seems to be the main developer of Breathe over the years.

I would be super interested to know if this would be possible because StrictDoc could certainly benefit from interfacing with Breathe, and I would be happy to make it a first-class feature.

Let me know if you feel like you could investigate this yourself. Otherwise, I cannot promise that I can get to this myself in the nearest few weeks.

    @staticmethod
    def write_with_validation(rst_fragment):
        # How do I convert a docutils document tree into an HTML string?
        # https://stackoverflow.com/a/32168938/598057
        # Use a io.StringIO as the warning stream to prevent warnings from
        # being printed to sys.stderr.
        # https://www.programcreek.com/python/example/88126/docutils.core.publish_parts
        warning_stream = io.StringIO()
        settings = {"warning_stream": warning_stream}

        try:
            output = publish_parts(
                rst_fragment, writer_name="html", settings_overrides=settings
            )
            warnings = (
                warning_stream.getvalue().rstrip("\n")
                if warning_stream.tell() > 0
                else None
            )
        except SystemMessage as exception:
            output = None
            warnings = str(exception)

        if warnings is not None and len(warnings) > 0:
            # A typical RST warning:
            # """
            # <string>:4: (WARNING/2) Bullet list ends without a blank line;
            # unexpected unindent.
            # """
            match = re.search(
                r".*<.*>:(?P<line>\d+): \(.*\) (?P<message>.*)", warnings
            )
            if match is not None:
                error_message = (
                    f"RST markup syntax error on line {match.group('line')}: "
                    f"{match.group('message')}"
                )
            else:
                error_message = f"RST markup syntax error: {warnings}"
            return None, error_message

        html = output["html_body"]

        return html, None

Apr 10 '23 13:04 stanislaw

Thank you for tagging me. I don't know that I follow everything but I think Breathe & Sphinx are quite interlinked at the moment. I do like the idea of only being dependent on docutils but the intention was to match the code documentation output of Sphinx's built in support for Python and other languages and that all uses custom nodes from Sphinx so it is quite embedded at the moment.

It would be an interesting area to explore. Either using only core docutils nodes or write some custom nodes of our own perhaps. Though I don't have any experience doing that.

We don't have a lot of developer time or money at the moment so progress is slow and this is unlikely to be tackled. I am currently attempting to write a new version of Breathe primarily using Rust to overcome some of the performance and memory issues that we've had. It is still in its early days though and the intention is to release it under the Parity license which is less permissive than Breathe's current BSD licensing and so will require a separate commercial license for non-open-source projects which I think might be the focus of the Strict Doc ecosystem (given the focused and technical problems that Strict Doc seems to be trying to solve.)

I hope some of that is useful. I don't know much about Strict Doc but having a read a little of the documentation it seems like a really interesting project!

Apr 10 '23 14:04 michaeljones

meanwhile I replaced .. autodoxygenfile ::<file> directive by $AUTODOXYGENFILE ::<file> in the .sdoc which disables this RST directive validation checks Then I post processes RST before breathe/sphinx html/pdf publishing

I was wondering why do strictdoc render html through docutils instead of sphinx ? (though looking at strictdoc documentation, there are several references to sphinx).

I like the sphinx as stricdoc 'backend' which enables various theme and possible extensions. But I'm still missing traceability matrices in strictdoc RST export

Apr 11 '23 15:04 elfman2

Hey @michaeljones, thanks for coming back to us very quickly! Very interesting insights about Rust and licensing topics. StrictDoc has been Apache 2 from the very beginning, and with time, we started receiving attention from companies who are looking into distributing StrictDoc as part of their commercial products. @mettta and I have looked into the option of going down the way Parity or *GPL licenses but decided to not do it to still keep StrictDoc very accessible to everyone (OSS, academia, commercial, single users as well as companies).

At the same time, the project has grown quite big, and we are running out of hands to support all the open issues with our limited spare time. We would be super happy to learn how to monetize our development and give StrictDoc a good boost, but for us, it is not clear how moving away from Apache 2 would make us closer to having StrictDoc development financially backed.

P.S. I could not find your email, so if you would want to exchange thoughts with me on the topic of licensing and OSS monetization, please write me an email (under my GitHub profile). This is a soft invitation, no pressure 😄

Apr 12 '23 11:04 stanislaw

meanwhile I replaced .. autodoxygenfile ::<file> directive by $AUTODOXYGENFILE ::<file> in the .sdoc which disables this RST directive validation checks Then I post processes RST before breathe/sphinx html/pdf publishing

As an easier workaround, I can suggest is that we register a autodoxygenfile directive with docutils (like I explained in my previous answer to you). This directive will not be realistic, but it will simplify your workflow with StrictDoc. The directive can render some basic output in StrictDoc's HTML, saying that it will be exported correctly when exported to RST and then Sphinx.

A more interesting workaround, but that's something that I don't have time to work myself, would be to look into Breathe's code and check how much coupling is in there with respect to Sphinx. It is also possible that Breathe is too extremely coupled with Sphinx (like @michaeljones suggested), so this path could be a no-go.

I was wondering why do strictdoc render html through docutils instead of sphinx ? (though looking at strictdoc documentation, there are several references to sphinx).

It is a good question. As I explained in my previous answer, we are using the publish_parts method that helps us to render RST-to-HTML fragment-by-fragment. Our main format is SDoc, so what the overall parser does is that it first builds the StrictDoc Abstract Syntax Tree (AST) and then, when it finds a multiline text, it calls publish_parts on that text to convert it to HTML. To my potentially limited and outdated knowledge, Sphinx does not give you an equivalent function that allows you to programmatically publish a chunk of RST to a chunk of HTML. Instead, Sphinx works as a complete machine, transforming several RST files to Sphinx documentation. I would be happy to be challenged in regard to this, though. If you could explain to me, how I could access Sphinx API to only publish a chunk of RST to HTML, we could introduce an option to work with Sphinx, not Docutils.

Sphinx is mentioned in the documentation in a few places, indeed, but everything only works with a docutils subset of RST, not Sphinx subset of RST like you already noticed. Please share ideas of how this could be achieved, given the limitation I presented above.

I like the sphinx as stricdoc 'backend' which enables various theme and possible extensions.

This is an interesting use case which we have not considered until now. I have to ask: if you are exporting to Sphinx anyway, what makes StrictDoc attracting for you? Have you considered https://sphinx-needs.readthedocs.io/en/latest/? They also provide a Sphinx-native way of creating requirements documents.

I am open to supporting Sphinx as a backend natively, but we need to find a way to get there. For now, please give a summary of:

How you are using StrictDoc. Which features you use and what you skip?
How you are using Sphinx. What is in StrictDoc that prevents you using Sphinx directly?

But I'm still missing traceability matrices in strictdoc RST export

I am missing them and many other things too 😄 . The matrices are on the roadmap, but we haven't reached implementing them yet. My own use case is to have the matrices presented on dedicated HTML pages, but while doing that, I could also do the RST variant. There are a few ongoing threads that we have to resolve before we can get to doing the matrices, so I cannot promise you, it will be done in April.

Apr 12 '23 11:04 stanislaw

I was thinking of a hackathon project for my vacation, something that would not make me feel like I work on StrictDoc :) I chose to check if it is possible to combine StrictDoc and Sphinx and got somewhat optimistically pessimistic results.

These were my inputs:

@elfman2 would like to have Breathe working in StrictDoc, but StrictDoc uses pure Docutils, and so we get the "unknown directive" if we use any non-Docutils directive, including those from Breathe.
@michaeljones mentioned that he is not happy about the performance of Breathe. I was worried about this because as explained in my previous comment, StrictDoc renders standalone fragments of RST to HTML, not single documents and documentation tree that Sphinx does. RST-to-HTML fragment rendering is already known to be StrictDoc's performance bottleneck, see #138, so with Breathe in the toolchain, StrictDoc would get even slower on the larger documents.
It is generally a nice idea to make StrictDoc compatible with Sphinx plugins, at least those that can work on a single directive-level and do not require any cross-fragment resolution of information.

My goals were:

Create a small Python program that, at first, exports one RST file to one HTML. By doing this, learn how to trigger Sphinx machine from Python API.
Try all possible hacks to reduce what Sphinx does and only focus on the essential production of RST-2-HTML.
Use Breathe directive to make sure that my optimizations still allow it to work.
Along the way, measure the performance of Sphinx itself, as well as what happens when Breathe is in the RST.

Results:

There is now a repo which successfully achieves all the goals and has some basic tests: https://github.com/stanislaw/sphinx-how-to-convert-standalone-rst-fragment-to-html.
I have implemented a Python program that supports 3 Sphinx builders:
- its native single_file_html builder
- its native single_file_html builder customized by me to do less copying of stuff not needed for the RST-to-HTML job. This one was an intermediate product of this work and is not very interesting for the findings below.
- The minimal builder where I tried to reduce all possible Sphinx magic.
The minimal builder:
- Reads RST fragment from memory, not from a file like Sphinx does it for all its builders.
- Stores HTML fragment to a Python variable memory, not writes it to a file.
- Disables all Sphinx-specific transforms which are many and include checks for cross-references, i18n translations, and many other things, without which the minimal builder still works perfectly as long as you keep the RST selection of directives conservative.

Important pieces of code:

The code of the Builder itself: https://github.com/stanislaw/sphinx-how-to-convert-standalone-rst-fragment-to-html/blob/5e64c7647fccc174d985cc563528a9aa68ee61b8/builders/minimal_builder.py#L37
The program that uses the builder: https://github.com/stanislaw/sphinx-how-to-convert-standalone-rst-fragment-to-html/blob/5e64c7647fccc174d985cc563528a9aa68ee61b8/generate_rst_fragment_to_html.py#L48.

My test fixture:

Hello **world**

.. doxygenfile:: imu.h
   :project: DO-178C

... 4 large paragraphs of random code from the internet ...

My results:

Converting my test fixture above 100 times from RST to HTML in memory using the minimal builder takes quite some time:

The execution time is: 5.577089031 (with Breathe)
The execution time is: 1.8110771249999997 (without Breathe)

(the numbers are seconds)

(my machine is MacBook Pro (Retina, 15', Mitte 2015)), my performance code is very naive and could be improved, but it still gives an idea of the order)

I am surprised by how slow Sphinx is even in a stripped-down configuration and only one RST fragment as an input. And indeed, with Breathe in a directive, the performance becomes even slower.

Where I see my experiment could be challenged:

I went quite far, cutting off all possible irrelevant code from the minimal builder. It should still be possible to reduce the code further, but at least the builder does its job in memory and no longer does any pickling of doctree to the file system.
I think, I didn't code any mistakes, but there is a chance that I am not using the Sphinx API in its best way. I did inspect the Sphinx internals as thoroughly as I could, but maybe a Sphinx expert could highlight something that escaped my experiment.

CONCLUSIONS:

It is certainly possible to make StrictDoc interface with Sphinx in a limited configuration that enables support of plugins like Breathe. In StrictDoc, I could enable a feature toggle that switches from Docutils RST to the Sphinx RST by using the cleaned up version of the code from this experimental repository.
For this workflow, I would expect very slow performance. "Slow" meaning waiting tens of seconds and minutes before a realistic medium-size project could be generated. In addition to the job that the StrictDoc machinery does, we would get Sphinx/Sphinx Builder/Breathe each eating a good chunk of performance time.
I am curious to know what makes Breathe slow, but didn't have time to understand what Breathe does.
As a possible workaround, I am curious to know if we, @elfman2, could look into finding a way to interoperate with Doxygen tree from the StrictDoc tree without pulling in Breathe and Sphinx. Would it make sense to focus on finding a way to combine the StrictDoc output and the Doxygen output in the produced HTML and PDF outputs?
- For HTML, I could imagine StrictDoc simply having a dedicated Doxygen link that would let a user jump to a corresponding Doxygen HTML tree? For PDF, I could imagine concatenating StrictDoc RST and Doxygen content as RST and making StrictDoc linking to the Doxygen parts correctly?

If you would want to run my code, here's how I running it from the root of the repository:

invoke test-integration --focus builders/03_minimal_builder/01_runs_without_errors/test.itest

Follow the test script to run what the test runs in your IDE.

May 12 '23 11:05 stanislaw

I tried a workaround to move the doxygen directive to a separate .rst file, which is referenced by the index.rst. https://github.com/strictdoc-project/strictdoc-templates/compare/main...elfman2:elfman2/do178?expand=1#diff-92b2fe8063b666c0ed77df85ce364e55c173d65a31870646e72b7ab5149b0e85

Then I text edit the .sdoc file and add a ref link to it. https://github.com/strictdoc-project/strictdoc-templates/compare/main...elfman2:elfman2/do178?expand=1#diff-ae559c82e6bfe6d8d63c0907954b299900dc77a0e989d4786787f5485f08be2d

I obtained a nice result in readthedoc https://strictdoc-templates.readthedocs.io/en/latest/rst/software/requirements/SDD.html#detailed-design

But then when I strictdoc server : error: problems when converting RST to HTML: :1: (ERROR/3) Unknown interpreted text role "ref". RST fragment: >>>

[imu] :ref:imu

Le ven. 12 mai 2023, 13:31, Stanislav Pankevich @.***> a écrit :

I was thinking of a hackathon project for my vacation, something that would not make me feel like I work on StrictDoc :) I chose to check if it is possible to combine StrictDoc and Sphinx and got somewhat optimistically pessimistic results.

These were my inputs:

@elfman2 https://github.com/elfman2 would like to have Breathe working in StrictDoc, but StrictDoc uses pure Docutils, and so we get the "unknown directive" if we use any non-Docutils directive, including those from Breathe.

@michaeljones https://github.com/michaeljones mentioned that he is not happy about the performance of Breathe. I was worried about this because as explained in my previous comment https://github.com/strictdoc-project/strictdoc/issues/1093#issuecomment-1505108384, StrictDoc renders standalone fragments of RST to HTML, not single documents and documentation tree that Sphinx does. RST-to-HTML fragment rendering is already known to be StrictDoc's performance bottleneck, see #138 https://github.com/strictdoc-project/strictdoc/issues/138, so with Breathe in the toolchain, StrictDoc would get even slower on the larger documents.

It is generally a nice idea to make StrictDoc compatible with Sphinx plugins, at least those that can work on a single directive-level and do not require any cross-fragment resolution of information.

My goals were:

Create a small Python program that, at first, exports one RST file to one HTML. By doing this, learn how to trigger Sphinx machine from Python API.

Try all possible hacks to reduce what Sphinx does and only focus on the essential production of RST-2-HTML.

Use Breathe directive to make sure that my optimizations still allow it to work.

Along the way, measure the performance of Sphinx itself, as well as what happens when Breathe is in the RST.

Results:

There is now a repo which successfully achieves all the goals and has some basic tests: https://github.com/stanislaw/sphinx-how-to-convert-standalone-rst-fragment-to-html .

I have implemented a Python program that supports 3 Sphinx builders:

its native single_file_html builder

its native single_file_html builder customized by me to do less copying of stuff not needed for the RST-to-HTML job. This one was an intermediate product of this work and is not very interesting for the findings below.

The minimal builder where I tried to reduce all possible Sphinx magic.

The minimal builder:

Reads RST fragment from memory, not from a file like Sphinx does it for all its builders.

Stores HTML fragment to a Python variable memory, not writes it to a file.

Disables all Sphinx-specific transforms which are many and include checks for cross-references, i18n translations, and many other things, without which the minimal builder still works perfectly as long as you keep the RST selection of directives conservative.

Important pieces of code:

The code of the Builder itself: https://github.com/stanislaw/sphinx-how-to-convert-standalone-rst-fragment-to-html/blob/5e64c7647fccc174d985cc563528a9aa68ee61b8/builders/minimal_builder.py#L37

The program that uses the builder: https://github.com/stanislaw/sphinx-how-to-convert-standalone-rst-fragment-to-html/blob/5e64c7647fccc174d985cc563528a9aa68ee61b8/generate_rst_fragment_to_html.py#L48 .

My test fixture:

Hello world

.. doxygenfile:: imu.h :project: DO-178C

... 4 large paragraphs of random code from the internet ...

My results:

Converting my test fixture above 100 times from RST to HTML in memory using the minimal builder takes quite some time:

The execution time is: 5.577089031 (with Breathe) The execution time is: 1.8110771249999997 (without Breathe)

(the numbers are seconds)

(my machine is MacBook Pro (Retina, 15', Mitte 2015)), my performance code is very naive and could be improved, but it still gives an idea of the order)

I am surprised by how slow Sphinx is even in a stripped-down configuration and only one RST fragment as an input. And indeed, with Breathe in a directive, the performance becomes even slower.

Where I see my experiment could be challenged:

I went quite far, cutting off all possible irrelevant code from the minimal builder. It should still be possible to reduce the code further, but at least the builder does its job in memory and no longer does any pickling of doctree to the file system.

I think, I didn't code any mistakes, but there is a chance that I am not using the Sphinx API in its best way. I did inspect the Sphinx internals as thoroughly as I could, but maybe a Sphinx expert could highlight something that escaped my experiment.

CONCLUSIONS:

It is certainly possible to make StrictDoc interface with Sphinx in a limited configuration that enables support of plugins like Breathe. In StrictDoc, I could enable a feature toggle that switches from Docutils RST to the Sphinx RST by using the cleaned up version of the code from this experimental repository.

For this workflow, I would expect very slow performance. "Slow" meaning waiting tens of seconds and minutes before a realistic medium-size project could be generated. In addition to the job that the StrictDoc machinery does, we would get Sphinx/Sphinx Builder/Breathe each eating a good chunk of performance time.

I am curious to know what makes Breathe slow, but didn't have time to understand what Breathe does.

As a possible workaround, I am curious to know if we, @elfman2 https://github.com/elfman2, could look into finding a way to interoperate with Doxygen tree from the StrictDoc tree without pulling in Breathe and Sphinx. Would it make sense to focus on finding a way to combine the StrictDoc output and the Doxygen output in the produced HTML and PDF outputs?

For HTML, I could imagine StrictDoc simply having a dedicated Doxygen link that would let a user jump to a corresponding Doxygen HTML tree? For PDF, I could imagine concatenating StrictDoc RST and Doxygen content as RST and making StrictDoc linking to the Doxygen parts correctly?

If you would want to run my code, here's how I running it from the root of the repository:

invoke test-integration --focus builders/03_minimal_builder/01_runs_without_errors/test.itest

Follow the test script to run what the test runs in your IDE.

— Reply to this email directly, view it on GitHub https://github.com/strictdoc-project/strictdoc/issues/1093#issuecomment-1545599711, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGPW7B2DS67YKD3NK4CMZE3XFYNPZANCNFSM6AAAAAAWVAT6CE . You are receiving this because you were mentioned.Message ID: @.***>

May 13 '23 21:05 elfman2

strictdoc strictdoc copied to clipboard

Unexpected restriction on specific RST directives / compatibility with Breathe Sphinx Plugin

I went quite far, cutting off all possible irrelevant code from the minimal builder. It should still be possible to reduce the code further, but at least the builder does its job in memory and no longer does any pickling of doctree to the file system.

I am curious to know what makes Breathe slow, but didn't have time to understand what Breathe does.

strictdoc
strictdoc copied to clipboard