markdown-it-py icon indicating copy to clipboard operation
markdown-it-py copied to clipboard

Plaintext Renderer

Open elespike opened this issue 4 years ago • 6 comments

Hello,

I've written a plaintext renderer that removes all markup. The inspiration for this is to facilitate NLP on a corpus of markdown documents. =)

Since I'm uncertain about a few things, I didn't want to make a PR just yet.

  1. Is it generally correct? I just learned about markdown-it-py, so I'm not that familiar with the code. I did test it on a few documents, at least.
  2. Do you think it's useful enough to include in the markdown-it-py repository, or as a plugin? (I'm not sure if renderers can be plugged in)
  3. There's a dependency on markupsafe.striptags() - is that ok? Do you think there's a better way to deal with HTML tags?

elespike avatar Dec 20 '20 06:12 elespike

Thanks for opening your first issue here! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out EBP's Code of Conduct. Also, please try to follow the issue template as it helps other community members to contribute more effectively.
If your issue is a feature request, others may react to it, to raise its prominence (see Feature Voting).
Welcome to the EBP community! :tada:

welcome[bot] avatar Dec 20 '20 06:12 welcome[bot]

hey @elespike yeh looking good thanks. I would probably say that it would be a separate repository, to keep this just the core code, and would "advertise" it in the documentation (as with the Markdown renderer https://github.com/executablebooks/markdown-it-py/issues/10#issuecomment-692917352) Then its fine to have the markupsafe dependency.

Renders are certainly pluggable in the API:

md = MarkdownIt("commonmark", renderer_cls=RendererPlain)

Not currently via the CLI though, but that could be easily added if there is interest.

The next step I suggest, would be to add some test fixtures. See, for example, https://github.com/executablebooks/markdown-it-py/blob/master/tests/test_port/test_fixtures.py and https://github.com/executablebooks/markdown-it-py/blob/master/tests/test_port/fixtures/smartquotes.md; the file defines a bunch of input Markdown and expected output texts:

Test description
.
input **markdown**
.
expected output
.

chrisjsewell avatar Dec 20 '20 18:12 chrisjsewell

Excellent, thanks! I did notice renderer_cls, so if that's the way then I'm all set =)

I'll do some bug fixing and add some tests, then create a repository :+1:

elespike avatar Dec 20 '20 20:12 elespike

@chrisjsewell I've finally released the plaintext renderer: https://github.com/elespike/mdit_plain

Where can I advertise it in the docs? Doesn't seem that a section for 3rd party plugins already exists.

elespike avatar Dec 04 '21 18:12 elespike

Awesome cheers @elespike, no there's not yet but indeed want to add one now as got mdit_plain (MD -> Plain), mdformat (MD -> MD), and myst-parser (MD -> Docutils/Sphinx). Feel free to make a PR with a section, or I'll do it at some point

chrisjsewell avatar Dec 06 '21 10:12 chrisjsewell

Cool, I'll see about adding a section - can't promise when =p

And just to confirm, the mdformat and myst-parser projects you mentioned are these ones, correct?

  • https://github.com/executablebooks/mdformat
  • https://github.com/executablebooks/myst-parser

I'll also close this issue unless you'd rather leave it until the docs PR goes through.

elespike avatar Dec 07 '21 03:12 elespike