commonmark.js icon indicating copy to clipboard operation
commonmark.js copied to clipboard

Add "exclude" (and/or "include") option to parser options

Open EthanRutherford opened this issue 6 years ago • 2 comments

I'm working on a chat application, and want to offer a (quite conservative) subset of markdown, but it turns out to be next to impossible to find any way to do so without writing something from scratch.

It would be really great if it were possible to pass in a set of desired formats (and/or a set to exclude), and have the parser only parse the resulting set of formats.

For example, we would like to only support code_fences, code_spans, bold, and italic. (think slack, but commonmark-compliant).

From checking the implementation, it seems to me that checking the options at construction and removing the functions for undesired formats from the parser instance would be a simple enough way to provide this functionality.

I'd be willing to PR this myself if the feature is accepted.

EthanRutherford avatar Dec 29 '17 21:12 EthanRutherford

Another approach (possible with the current API) would be to postprocess the AST between parsing and rendering.

At this step you could, e.g., convert links to normal text, convert HTML to escaped text, and so on.

Depending on what you want to exclude, and why, this might be worth exploring.

As for excluding different markdown syntax features piecemeal, I'm not too interested in that for this library, which is meant as a reference implementation for CommonMark.

+++ Ethan Rutherford [Dec 29 17 21:10 ]:

I'm working on a chat application, and want to offer a (quite conservative) subset of markdown, but it turns out to be next to impossible to find any way to do so without writing something from scratch.

It would be really great if it were possible to pass in a set of desired formats (and/or a set to exclude), and have the parser only parse the resulting set of formats.

For example, we would like to only support code_fences, code_spans, bold, and italic. (think slack, but commonmark-compliant).

From checking the implementation, it seems to me that checking the options at construction and removing the functions for undesired formats from the parser instance would be a simple enough way to provide this functionality.

I'd be willing to PR this myself if the feature is accepted.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, [1]view it on GitHub, or [2]mute the thread.

References

  1. https://github.com/commonmark/commonmark.js/issues/134
  2. https://github.com/notifications/unsubscribe-auth/AAAL5PIP6NpRZvIjY6hv1w1aQpLJLV8yks5tFVUzgaJpZM4RPY6M

jgm avatar Dec 30 '17 07:12 jgm

We implemented something like this in league/commonmark (a fork of this library for PHP) by decoupling the specific syntax features from the core engine.

We broke out all the parsers, processors, renderers, and node elements into separate classes. These individual components then get registered with an "environment" at construction time so that the core engine knows about them. (Multiple components can also be grouped into a single "extension" to make registering them easier.) The default environment is constructed with a "CommonMarkCoreExtension" which includes every feature defined by the CommonMark spec.

However, if a user only wants a subset of features, they can construct their own environment instead of using the default one and only register the individual components they need.

The underlying logic and execution flow of our PHP library is almost identical to this reference implementation, so it's definitely possible to refactor this library to support a piecemeal approach while fully supporting CommonMark, which our users seem to love.

However, I will admit that our approach has some rough edges and weird coupling in some areas, so it's not perfect. If a similar approach were adopted for this reference implementation it would require significant effort to get it 100% right which is something you'd expect from a reference implementation.

So from my perspective, refactoring to this approach probably won't happen unless supporting full extensibility is a priority and the maintainers/contributors have the bandwidth to do it properly.

colinodell avatar Dec 31 '17 15:12 colinodell