Add basic "rich-text" features to texts
It would be very handy to be able to create headings, indent, align, use bold/italic/underline/etc in the texts we create/read.
This is a nice idea, and it has the same challenges as #169 (add images) - the parsers would need to distinguish between words and presentation tokens. Thanks for the issue, cheers.
I was looking into adding images to texts, which relates to this issue. The easiest way I can think of to do this would be to use Markdown as the text specification, e.g. **bold** is rendered as "bold" here is GitHub, etc, but that creates some problems when applied to existing non-markdown texts.
With e.g. MIT-licensed library mistletoe, it's possible to parse markdown. e.g this document
# TESTING
**Here** is some stuff.
```
python3.8 -m venv .venv
```
trying out mistletoe -- [link](https://github.com/miyuchina/mistletoe?tab=readme-ov-file)
Here is some text that is broken up into shorter lines
but I'm pretty sure that markdown will combine them into one.
So for book imports, this might not work well.
Because.
These might all be paragraphs.
This would def be a separate paragraph.
---
page 2?
```
MacBook-Pro:zz_mistletoe_test jeff$ source .venv/bin/activate
(.venv) MacBook-Pro:zz_mistletoe_test jeff$ pip3 install mistletoe
```
gives this ast (abstract syntax tree)
AST tree for the above
{
"type": "Document",
"footnotes": {},
"line_number": 1,
"children": [
{
"type": "Heading",
"line_number": 1,
"level": 1,
"children": [
{
"type": "RawText",
"content": "TESTING"
}
]
},
{
"type": "Paragraph",
"line_number": 3,
"children": [
{
"type": "Strong",
"children": [
{
"type": "RawText",
"content": "Here"
}
]
},
{
"type": "RawText",
"content": " is some stuff."
}
]
},
{
"type": "CodeFence",
"line_number": 5,
"language": "",
"children": [
{
"type": "RawText",
"content": "python3.8 -m venv .venv\n"
}
]
},
{
"type": "Paragraph",
"line_number": 9,
"children": [
{
"type": "RawText",
"content": "trying out mistletoe -- "
},
{
"type": "Link",
"target": "https://github.com/miyuchina/mistletoe?tab=readme-ov-file",
"title": "",
"children": [
{
"type": "RawText",
"content": "link"
}
]
}
]
},
{
"type": "Paragraph",
"line_number": 11,
"children": [
{
"type": "RawText",
"content": "Here is some text that is broken up into shorter lines"
},
{
"type": "LineBreak",
"content": "",
"soft": true
},
{
"type": "RawText",
"content": "but I'm pretty sure that markdown will combine them into one."
},
{
"type": "LineBreak",
"content": "",
"soft": true
},
{
"type": "RawText",
"content": "So for book imports, this might not work well."
},
{
"type": "LineBreak",
"content": "",
"soft": true
},
{
"type": "RawText",
"content": "Because."
},
{
"type": "LineBreak",
"content": "",
"soft": true
},
{
"type": "RawText",
"content": "These might all be paragraphs."
}
]
},
{
"type": "Paragraph",
"line_number": 17,
"children": [
{
"type": "RawText",
"content": "This would def be a separate paragraph."
}
]
},
{
"type": "ThematicBreak",
"line_number": 19
},
{
"type": "Paragraph",
"line_number": 21,
"children": [
{
"type": "RawText",
"content": "page 2?"
}
]
},
{
"type": "CodeFence",
"line_number": 23,
"language": "",
"children": [
{
"type": "RawText",
"content": "MacBook-Pro:zz_mistletoe_test jeff$ source .venv/bin/activate\n(.venv) MacBook-Pro:zz_mistletoe_test jeff$ pip3 install mistletoe\n"
}
]
}
]
}
from this code:
from mistletoe import Document, HtmlRenderer
from mistletoe.ast_renderer import AstRenderer
with open('README.md', 'r') as fin:
with AstRenderer() as renderer: # or: `with HtmlRenderer(AnotherToken1, AnotherToken2) as renderer:`
doc = Document(fin) # parse the lines into AST
rendered = renderer.render(doc) # render the AST
# internal lists of tokens to be parsed are automatically reset when exiting this `with` block
print(rendered)
For regular textfile book imports, this might cause a problem, b/c as shown above markdown assumes that lines that aren't separated by at least one blank line are really one single paragraph. That would mean that there would be vastly different processing paths, depending on whether or not the imported book is markdown vs all of the rest (text, plaintext, epub, etc.)