fpdf2 Markdown not escaping MD special characters

Error details When using markdown in fpdf2 it should respect escaped characters like underscores and asterisks. It does not currently.

Minimal code Please include some minimal Python code reproducing your issue:

from fpdf import FPDF

pdf = FPDF(format=(101.6, 152.4))
pdf.set_margin(2)
pdf.add_page()
pdf.set_font("Times", size=16)

txt = "**This** is a markdown test. Theres\__double underscores"
pdf.multi_cell(w=50, text=txt, align='L', new_x='RIGHT',
                new_y='TOP', border=0, markdown=True,)

pdf.output("md.pdf")

Should output "This is a markdown test. Theres__double underscores" but outputs Screenshot 2024-06-25 at 10 04 24 AM

Environment Please provide the following information:

Operating System: macOS 14.5 Apple Silicon M2 Mac Mini
Python version: 3.11.9
fpdf2 version used: 2.7.9

Jun 25 '24 14:06 exiva

Thanks for reporting this, and welcome to fpdf2, @exiva !

Well, the documentation doesn't promise that our markup parser would honor backslash escapes, so technically this isn't a bug... :wink:

it is a perfectly valid feature request, though! Any chance you might want to submit a PR?

Jun 25 '24 14:06 gmischler

If you want a head start, markdown markers are handled on fpdf _parse_chars() function.

https://github.com/py-pdf/fpdf2/blob/d574b0796d5f64aade3a6051554fe6708e2e320a/fpdf/fpdf.py#L3500-L3515

Jun 25 '24 14:06 andersonhc

Implemented in #1224

Jul 20 '24 05:07 gmischler

Hi.

I'd like to reopen this issue to further discuss this.

I may be mistaken, but the escaping implemented in PR #1224 does not seem to conform to the CommonMark standard.

You can test the rendering of \\**Lorem\\** \\\\__Ipsum\\\\__ --dolor-- in https://spec.commonmark.org/dingus/

The result is this:

Escaping double underscores in Markdown should be done this way: \_\_Ipsum\_\_

What do you think @gmischler, @exiva & @david-fed?

Aug 19 '24 11:08 Lucas-C

Escaping double underscores in Markdown should be done this way: \_\_Ipsum\_\_

In "standard" markdown (including commonmark), escaping individual marker characters is necessary because both a single character as well as a double character sequence can act as a marker.

In our own decidedly non-standard markdown implementation, all markers consist of double character sequences. This has the advantage of reducing conflicts with actual text, which is why I hope we'll stick to the concept for any future feature additions. It also means that it is fundamentally incompatible with every other markdown implementation out there (which could be seen both as an advantage or disadvantage). But the key consequence here is that there is no need - and no real reason - to adhere to any standard markdown escape specification. So as far as I am concerned, the current implementation is perfectly fine.

Added: The bug reported in #1236 obviously still needs to get fixed.

Aug 19 '24 13:08 gmischler

he key consequence here is that there is no need - and no real reason - to adhere to any standard markdown escape specification.

I do not quite agree with that 🙂

Yes, you are right, currently our Markdown implementation is relatively limited and far from following any standard.

However, I see several paths forward:

we aim to progressively get closer to the standard CommonMark spec, which is why I find it problematic that fpdf2 currently renders \\\\__Ipsum\\\\__ the way it does, as it is not compatible with the CommonMark spec on backslash escapes: https://spec.commonmark.org/0.31.2/#backslash-escapes
we drop the Markdown rendering feature in fpdf2 (in v2.8.0 ?), and simply recommend that users use mistletoe or any other Markdown-rendering library instead, because that's not really the goal of fpdf2 (which is what I do most of the time)
we continue to have a limited / "broken" implementation of Markdown in fpdf2

IMHO we should aim for option 1. or 2., but not for option 3., which is the least interesting for fpdf2 in the long term.

Aug 19 '24 16:08 Lucas-C

we aim to progressively get closer to the standard CommonMark spec, which is why I find it problematic that fpdf2 currently renders \\\\__Ipsum\\\\__ the way it does, as it is not compatible with the CommonMark spec on backslash escapes: https://spec.commonmark.org/0.31.2/#backslash-escapes

Given the fundamental conceptual incompabilities and the way we currently handle deprecated features (without a clear deprecation strategy/policy), this would take decades and they would never actually be the same. It is better to keep it clearly distinct from commonmark, to avoid the two getting confused.

we drop the Markdown rendering feature in fpdf2 (in v2.8.0 ?), and simply recommend that users use mistletoe or any other Markdown-rendering library instead, because that's not really the goal of fpdf2 (which is what I do most of the time)

we continue to have a limited / "broken" implementation of Markdown in fpdf2

Accept that our own markdown implementation is "special" (as mentioned, this actually has certain practical advantages vis-a-vis commonmark).
If someone really absolutely wants a standards compliant markdown feature, add a "commonmark" parameter in parallel, which uses one of the several open source Python commonmark parsers, offering full compability. We have discussed this option before and you seemed to agree that it is a good idea.

Aug 20 '24 05:08 gmischler

5. If someone really absolutely wants a standards compliant markdown feature, add a "commonmark" parameter in parallel, which uses one of the several open source Python commonmark parsers, offering full compability. We have discussed this option before and you seemed to agree that it is a good idea.

In that comment I wrote:

So I'm not really sure of the path forward regarding Markdown support...

😅

We already have some documentation on how to use a Markdown library combined with fpdf2: https://py-pdf.github.io/fpdf2/CombineWithMistletoeoToUseMarkdown.html

I'm really not sure that fpdf2 should include code specific to any Markdown renderer / CommonMark parser... And even less sure that we should support "our own Markdown flavor" AND also be compatible with CommonMark, with an optional dependency... This sounds a bit like feature creep to me.

But I'm honestly curious to know what end-users think about it:

should we keep our "very basic Markdown flavor"
should we get rid of it and better document how to use other Markdown libraries, that do a far better job than us, in conjunction with fpdf2?

Maybe we could start a GitHub discussion poll about this.

Aug 20 '24 07:08 Lucas-C

So I'm not really sure of the path forward regarding Markdown support...

Ah, so I was overinterpreting your statements back then. Sorry for that.

All the same, making our markdown implementation "more like commonmark" in any meaningful way would require to change the fundamental concept of its syntax: Namely to use double characters for all formatting markers. Given our general promise to stay backwards compatible, often at great cost, I don't see this as a viable option.

So if we do want a more standard implementation (which essentially equals some flavour of commonmark), then there is no real alternative to adding it as an alternative in parallel. If at the same time we don't want to bind ourselfes to any particular flavour/implementation of commonmark, maybe that would be another case for the data adapter concept? This one would need to be quite tightly integrated with our text preprocessing machinery, but I think it could reasonably be done.

Aug 20 '24 16:08 gmischler