docassemble
docassemble copied to clipboard
Markdown to PDF Parsing - **Bold** =/= __Bold__
I'm exporting Markdown templates to PDF. Double-asterisk does not parse properly when in a [CENTER] environment. For some reason, double-underscore provides the expected result. I am using the default template.
The following minimal interview generates the unexpected result when pasted into a new playground on version 1.1.19, using a Docker container.
mandatory: True
question: Generate centered bold text with centered text on next line
attachments:
- name: Template Test 1
filename: template_test_1
content: ${ template_1 }
- name: Template Test 2
filename: template_test_2
content: ${ template_2 }
---
template: template_1
content: |
[TIGHTSPACING]
[CENTER] __Bold Centered__ [BR]
(Centered)
This is the expected PDF output.
---
template: template_2
content: |
[TIGHTSPACING]
[CENTER] **Bold Centered** [BR]
(Centered)
Double asterisk doesn't parse properly.
I am trying to get the result in template_1 described above, but (parsing error aside) the syntax is a bit of a kludge. Wouldn't specifying the formatting using [BOLDCENTER] and [CENTERED] be more Docassemblish? Try as I might, I can't get the results of template_1 without a newline being inserted between them. For example, the following does not work:
mandatory: True
question: Generate centered bold text with centered text on next line without a newline between
attachments:
- name: Template Test 3
filename: template_test_3
content: ${ template_3 }
---
template: template_3
content: |
[TIGHTSPACING]
[BOLDCENTER] Bold Centered
[CENTER] (Centered)
Is there a more "Docassemble-y" way of specifying the result in template_1?
The issue seems to be related to [CENTERED] being on the same line as the emphasized text. The [TIGHTSPACING] directive is unnecessary. For example, the first example renders correctly, but the second doesn't:
mandatory: True
question: Generate centered bold text with centered text on next line without a newline between
attachments:
- name: Template Test 4
filename: template_test_4
content: ${ template_4 }
- name: Template Test 5
filename: template_test_5
content: ${ template_5 }
---
template: template_4
content: |
[CENTER]
**Bold Centered** [BR]
(Centered)
Double asterisk parses properly here.
---
template: template_5
content: |
[CENTER] **Bold Centered** [BR]
(Centered)
Double asterisk does not parse properly here.
The reason for [CENTER] and other bracket expressions is because Markdown and Pandoc don't support this kind of formatting. So I use regular expressions to turn things like [CENTER] into LaTeX codes (or HTML codes in the the HTML context). I can use LaTeX codes because Pandoc uses LaTeX to make PDFs, and it accepts LaTeX mixed in with Markdown, although there is some ambiguity when you try to mix them.
I don't have the resources to reinvent Markdown, Pandoc, and LaTeX and build my own plain-text-to-PDF system that is 100% robust. LaTeX is 6GB and has been in development for 40 years whereas I am just one person. So I'm just tapping into Pandoc and LaTeX and trying to make it possible for users to write Markdown and Markdown-ish text that converts to HTML as well as to PDF.
I may be able to fix this issue, but if you need to be particular about document formatting when using Markdown, the best thing to do is write raw LaTeX.
The other alternative is to use the docx template file system, which lets you put all the formatting details in a .docx file, which unlike Markdown is designed for typesetting.
No need for you to prioritize this issue. I'm bored so I might twiddle some bits and see if I can fix it. Looking at the code it's probably something in filter.py?