markdown-it-py
markdown-it-py copied to clipboard
Plaintext Renderer
Hello,
I've written a plaintext renderer that removes all markup. The inspiration for this is to facilitate NLP on a corpus of markdown documents. =)
Since I'm uncertain about a few things, I didn't want to make a PR just yet.
- Is it generally correct? I just learned about
markdown-it-py
, so I'm not that familiar with the code. I did test it on a few documents, at least. - Do you think it's useful enough to include in the
markdown-it-py
repository, or as a plugin? (I'm not sure if renderers can be plugged in) - There's a dependency on
markupsafe.striptags()
- is that ok? Do you think there's a better way to deal with HTML tags?
Thanks for opening your first issue here! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out EBP's Code of Conduct. Also, please try to follow the issue template as it helps other community members to contribute more effectively.
If your issue is a feature request, others may react to it, to raise its prominence (see Feature Voting).
Welcome to the EBP community! :tada:
hey @elespike yeh looking good thanks.
I would probably say that it would be a separate repository, to keep this just the core code, and would "advertise" it in the documentation (as with the Markdown renderer https://github.com/executablebooks/markdown-it-py/issues/10#issuecomment-692917352)
Then its fine to have the markupsafe
dependency.
Renders are certainly pluggable in the API:
md = MarkdownIt("commonmark", renderer_cls=RendererPlain)
Not currently via the CLI though, but that could be easily added if there is interest.
The next step I suggest, would be to add some test fixtures. See, for example, https://github.com/executablebooks/markdown-it-py/blob/master/tests/test_port/test_fixtures.py and https://github.com/executablebooks/markdown-it-py/blob/master/tests/test_port/fixtures/smartquotes.md; the file defines a bunch of input Markdown and expected output texts:
Test description
.
input **markdown**
.
expected output
.
Excellent, thanks! I did notice renderer_cls
, so if that's the way then I'm all set =)
I'll do some bug fixing and add some tests, then create a repository :+1:
@chrisjsewell I've finally released the plaintext renderer: https://github.com/elespike/mdit_plain
Where can I advertise it in the docs? Doesn't seem that a section for 3rd party plugins already exists.
Awesome cheers @elespike, no there's not yet but indeed want to add one now as got mdit_plain (MD -> Plain), mdformat (MD -> MD), and myst-parser (MD -> Docutils/Sphinx). Feel free to make a PR with a section, or I'll do it at some point
Cool, I'll see about adding a section - can't promise when =p
And just to confirm, the mdformat
and myst-parser
projects you mentioned are these ones, correct?
- https://github.com/executablebooks/mdformat
- https://github.com/executablebooks/myst-parser
I'll also close this issue unless you'd rather leave it until the docs PR goes through.