lute-v3 icon indicating copy to clipboard operation
lute-v3 copied to clipboard

Add basic "rich-text" features to texts

Open M-Biggles opened this issue 1 year ago • 2 comments

It would be very handy to be able to create headings, indent, align, use bold/italic/underline/etc in the texts we create/read.

M-Biggles avatar Feb 01 '24 20:02 M-Biggles

This is a nice idea, and it has the same challenges as #169 (add images) - the parsers would need to distinguish between words and presentation tokens. Thanks for the issue, cheers.

jzohrab avatar Feb 02 '24 11:02 jzohrab

I was looking into adding images to texts, which relates to this issue. The easiest way I can think of to do this would be to use Markdown as the text specification, e.g. **bold** is rendered as "bold" here is GitHub, etc, but that creates some problems when applied to existing non-markdown texts.

With e.g. MIT-licensed library mistletoe, it's possible to parse markdown. e.g this document

  # TESTING
  
  **Here** is some stuff.
  
  ```
  python3.8 -m venv .venv
  ```
  
  trying out mistletoe -- [link](https://github.com/miyuchina/mistletoe?tab=readme-ov-file)
  
  Here is some text that is broken up into shorter lines
  but I'm pretty sure that markdown will combine them into one.
  So for book imports, this might not work well.
  Because.
  These might all be paragraphs.
  
  This would def be a separate paragraph.
  
  ---
  
  page 2?
  
  ```
  MacBook-Pro:zz_mistletoe_test jeff$ source .venv/bin/activate
  (.venv) MacBook-Pro:zz_mistletoe_test jeff$ pip3 install mistletoe
  ```

gives this ast (abstract syntax tree)

AST tree for the above
{
  "type": "Document",
  "footnotes": {},
  "line_number": 1,
  "children": [
    {
      "type": "Heading",
      "line_number": 1,
      "level": 1,
      "children": [
        {
          "type": "RawText",
          "content": "TESTING"
        }
      ]
    },
    {
      "type": "Paragraph",
      "line_number": 3,
      "children": [
        {
          "type": "Strong",
          "children": [
            {
              "type": "RawText",
              "content": "Here"
            }
          ]
        },
        {
          "type": "RawText",
          "content": " is some stuff."
        }
      ]
    },
    {
      "type": "CodeFence",
      "line_number": 5,
      "language": "",
      "children": [
        {
          "type": "RawText",
          "content": "python3.8 -m venv .venv\n"
        }
      ]
    },
    {
      "type": "Paragraph",
      "line_number": 9,
      "children": [
        {
          "type": "RawText",
          "content": "trying out mistletoe -- "
        },
        {
          "type": "Link",
          "target": "https://github.com/miyuchina/mistletoe?tab=readme-ov-file",
          "title": "",
          "children": [
            {
              "type": "RawText",
              "content": "link"
            }
          ]
        }
      ]
    },
    {
      "type": "Paragraph",
      "line_number": 11,
      "children": [
        {
          "type": "RawText",
          "content": "Here is some text that is broken up into shorter lines"
        },
        {
          "type": "LineBreak",
          "content": "",
          "soft": true
        },
        {
          "type": "RawText",
          "content": "but I'm pretty sure that markdown will combine them into one."
        },
        {
          "type": "LineBreak",
          "content": "",
          "soft": true
        },
        {
          "type": "RawText",
          "content": "So for book imports, this might not work well."
        },
        {
          "type": "LineBreak",
          "content": "",
          "soft": true
        },
        {
          "type": "RawText",
          "content": "Because."
        },
        {
          "type": "LineBreak",
          "content": "",
          "soft": true
        },
        {
          "type": "RawText",
          "content": "These might all be paragraphs."
        }
      ]
    },
    {
      "type": "Paragraph",
      "line_number": 17,
      "children": [
        {
          "type": "RawText",
          "content": "This would def be a separate paragraph."
        }
      ]
    },
    {
      "type": "ThematicBreak",
      "line_number": 19
    },
    {
      "type": "Paragraph",
      "line_number": 21,
      "children": [
        {
          "type": "RawText",
          "content": "page 2?"
        }
      ]
    },
    {
      "type": "CodeFence",
      "line_number": 23,
      "language": "",
      "children": [
        {
          "type": "RawText",
          "content": "MacBook-Pro:zz_mistletoe_test jeff$ source .venv/bin/activate\n(.venv) MacBook-Pro:zz_mistletoe_test jeff$ pip3 install mistletoe\n"
        }
      ]
    }
  ]
}

from this code:

from mistletoe import Document, HtmlRenderer
from mistletoe.ast_renderer import AstRenderer

with open('README.md', 'r') as fin:
    with AstRenderer() as renderer:     # or: `with HtmlRenderer(AnotherToken1, AnotherToken2) as renderer:`
        doc = Document(fin)              # parse the lines into AST
        rendered = renderer.render(doc)  # render the AST
        # internal lists of tokens to be parsed are automatically reset when exiting this `with` block
        print(rendered)

For regular textfile book imports, this might cause a problem, b/c as shown above markdown assumes that lines that aren't separated by at least one blank line are really one single paragraph. That would mean that there would be vastly different processing paths, depending on whether or not the imported book is markdown vs all of the rest (text, plaintext, epub, etc.)

jzohrab avatar Mar 27 '24 08:03 jzohrab