openfasttrace Support for RST importer

Description

These days more and more documents are written in ReST format (especially in the open source projects), whereas at the moment, OFT expects a document to be written in Markdown format. It would be great to meet users' expectations by adding support for the new format which is more powerful and supports more complex data structures. Despite many similarities, converting from one format to another takes a long time.

Intermediate solution

With the following modification, we can hack OFT to detect RST files as if they were MD files. However, due to differences in heading format, the tool is no longer able to detect the title.

diff --git a/importer/markdown/src/main/java/org/itsallcode/openfasttrace/importer/markdown/MarkdownImporterFactory.java b/importer/markdown/src/main/java/org/itsallcode/openfasttrace/importer/markdown/MarkdownImporterFactory.java
index d112694afba5..f1c068fe2a85 100644
--- a/importer/markdown/src/main/java/org/itsallcode/openfasttrace/importer/markdown/MarkdownImporterFactory.java
+++ b/importer/markdown/src/main/java/org/itsallcode/openfasttrace/importer/markdown/MarkdownImporterFactory.java
@@ -11,7 +11,7 @@ public class MarkdownImporterFactory extends RegexMatchingImporterFactory
     /** Creates a new instance. */
     public MarkdownImporterFactory()
     {
-        super("(?i).*\\.markdown", "(?i).*\\.md");
+        super("(?i).*\\.markdown", "(?i).*\\.md", "(?i).*\\.rst");
     }

     @Override

Feb 09 '24 10:02 orzelmichal

While Markdown is in general still more popular due to it's simplicity, RST has become the de-facto standard for documentation in Python projects. That alone is already a very good reason to support it.

Preparation

First order of business: checking RST parsers.

So far OFT has no external dependencies (with the exception of the Java runtime). We need to decide:

a) we keep it this way → we need to write the parser ourselves b) we accept the external dependency and take an existing one

Criteria for an RST parser we could accept:

No transitive dependencies
Regular releases
No unpatched CVEs
Proper test coverage
Decent code quality

Feb 12 '24 12:02 redcatbear

@orzelmichal, I did some research. I found no active Java project that provides a RST parser. There are two abandoned projects, but that's it.

So I will have to write a parser that is based on the one we use for Markdown. In fact it will for the most part probably be the same code. My goal here is not to cover the whole feature set of RST, but instead a very limited subset that makes defining requirements possible. For instance, support for extensions is definitely out-of-scope.

In our discussions you mentioned that headlines were the one thing that gives you trouble sofar, so that's what I will put my main focus on. The underlined headline style is one that in Markdown exists too, but I did not support it in the Markdown parser sofar, since there is a more commonly used alternative that is also easier to parse. I am guessing here that in the end the Markdown parser will gain the capability to parse that headline style too, even if I have yet to see someone use it. Anyway, that's a nice side effect.

Feb 13 '24 07:02 redcatbear

@redcatbear Thanks for the investigation. Yes, the headline would be a good starting point. Markdown supports only =,- whereas RST supports several different characters (some of which are very rarely used) like =,-,^,",~.

Feb 13 '24 13:02 orzelmichal

#384 adds support for underline-style titles to the Markdown importer. Which means, we cracked the tricky part already. Next step: extracting common code.

Feb 14 '24 12:02 redcatbear

@orzelmichal, the code for underline-style headlines in Markdown is now on main. Using your patch to make the Markdown parser ingest RST files, you can already try this out. At the moment only --- and === underlines are supported. We will extend that in the RST importer, since RST allows a lot more then.

Feb 15 '24 10:02 redcatbear