docassemble icon indicating copy to clipboard operation
docassemble copied to clipboard

Markdown to PDF Parsing - **Bold** =/= __Bold__

Open patrickr81 opened this issue 5 years ago • 4 comments

I'm exporting Markdown templates to PDF. Double-asterisk does not parse properly when in a [CENTER] environment. For some reason, double-underscore provides the expected result. I am using the default template.

The following minimal interview generates the unexpected result when pasted into a new playground on version 1.1.19, using a Docker container.

mandatory: True
question: Generate centered bold text with centered text on next line
attachments:
  - name: Template Test 1
    filename: template_test_1
    content: ${ template_1 }
  - name: Template Test 2
    filename: template_test_2
    content: ${ template_2 }
---
template: template_1
content: |
  [TIGHTSPACING]
  [CENTER] __Bold Centered__ [BR]
  (Centered)
  
  This is the expected PDF output.
---
template: template_2
content: |
  [TIGHTSPACING]
  [CENTER] **Bold Centered** [BR]
  (Centered)
  
  Double asterisk doesn't parse properly.

patrickr81 avatar Apr 30 '20 14:04 patrickr81

I am trying to get the result in template_1 described above, but (parsing error aside) the syntax is a bit of a kludge. Wouldn't specifying the formatting using [BOLDCENTER] and [CENTERED] be more Docassemblish? Try as I might, I can't get the results of template_1 without a newline being inserted between them. For example, the following does not work:

mandatory: True
question: Generate centered bold text with centered text on next line without a newline between
attachments:
  - name: Template Test 3
    filename: template_test_3
    content: ${ template_3 }
---
template: template_3
content: |
  [TIGHTSPACING]
  [BOLDCENTER] Bold Centered
  
  [CENTER] (Centered)
  
  Is there a more "Docassemble-y" way of specifying the result in template_1?  

patrickr81 avatar Apr 30 '20 14:04 patrickr81

The issue seems to be related to [CENTERED] being on the same line as the emphasized text. The [TIGHTSPACING] directive is unnecessary. For example, the first example renders correctly, but the second doesn't:

mandatory: True
question: Generate centered bold text with centered text on next line without a newline between
attachments:
  - name: Template Test 4
    filename: template_test_4
    content: ${ template_4 }
  - name: Template Test 5
    filename: template_test_5
    content: ${ template_5 }
---
template: template_4
content: |
  [CENTER]
  **Bold Centered** [BR]
  (Centered)
  
  Double asterisk parses properly here.
--- 
template: template_5
content: |
  [CENTER] **Bold Centered** [BR]
  (Centered)
  
  Double asterisk does not parse properly here.

patrickr81 avatar Apr 30 '20 16:04 patrickr81

The reason for [CENTER] and other bracket expressions is because Markdown and Pandoc don't support this kind of formatting. So I use regular expressions to turn things like [CENTER] into LaTeX codes (or HTML codes in the the HTML context). I can use LaTeX codes because Pandoc uses LaTeX to make PDFs, and it accepts LaTeX mixed in with Markdown, although there is some ambiguity when you try to mix them.

I don't have the resources to reinvent Markdown, Pandoc, and LaTeX and build my own plain-text-to-PDF system that is 100% robust. LaTeX is 6GB and has been in development for 40 years whereas I am just one person. So I'm just tapping into Pandoc and LaTeX and trying to make it possible for users to write Markdown and Markdown-ish text that converts to HTML as well as to PDF.

I may be able to fix this issue, but if you need to be particular about document formatting when using Markdown, the best thing to do is write raw LaTeX.

The other alternative is to use the docx template file system, which lets you put all the formatting details in a .docx file, which unlike Markdown is designed for typesetting.

jhpyle avatar Apr 30 '20 16:04 jhpyle

No need for you to prioritize this issue. I'm bored so I might twiddle some bits and see if I can fix it. Looking at the code it's probably something in filter.py?

patrickr81 avatar May 01 '20 00:05 patrickr81