CommonMark.NET icon indicating copy to clipboard operation
CommonMark.NET copied to clipboard

Extensions

Open dmitry-shechtman opened this issue 8 years ago • 25 comments

@Knagis,

So I've had some solid progress on enabling extensions. Now I'd appreciate your feedback on pre-existing features in this regard.

  1. [x] Strikeout

    seems to me as an obvious candidate.

  2. [x] Line breaks

    I believe this one should also be an extension.

  3. [x] Unicode bullet

    I didn't see it mentioned anywhere in the spec. Extension?

  4. [x] HtmlFormatter + OutputDelegate

    Are two different formatters really required? AFAIK one of these was added as a means to enable customizations. Since we now have a modular design, no modifications to the bulk formatters should be required.

  5. [x] TrackSourcePosition

    The implementation would obviously remain deeply integrated, just the API would be a little cleaner, similarly to the reference label case extension.

  6. [ ] UriResolver

    Need I say more?

  7. [ ] Anything else?

Thanks

Edit: Added omissions and turned into checklist.

dmitry-shechtman avatar Dec 22 '15 06:12 dmitry-shechtman

Strikeout can be an extension, line breaks could be as well. The problem is though that currently it is very easy for the developer to discover these additional features that are supported by default (through the switch). It would be nice that the discovery could be made just as easy.

Unicode bullet is not in the spec (I haven't heard any good arguments against supporting it and I actually made a PR at one point that added it both to the spec and both reference implementations, the PR is still open). It could be an extension though it is a question of does it really have to be - maybe it is easier to just leave it as is.

HtmlFormatter was mainly created because people wanted to do things like <h1 class="header">foo</h1> with as little code as possible.

Knagis avatar Dec 22 '15 20:12 Knagis

All valid points. Let me address each separately (these could easily become own issues).

dmitry-shechtman avatar Dec 23 '15 02:12 dmitry-shechtman

Strikeout can be an extension, line breaks could be as well. The problem is though that currently it is very easy for the developer to discover these additional features that are supported by default (through the switch). It would be nice that the discovery could be made just as easy.

Sure, it's easy to discover Strikethrough when it's the only available additional feature ;)

Admittedly I didn't dedicate much thought to extension discoverability. All extensions currently reside in the Extension namespace, which may quickly become bloated as settings and features are added.

So, for example, there is a FancyLists extension with FancyListsSettings, which in turn has FancyListsFeatures, NumericListStyles and AdditiveListStyles.

One possible solution would be to use type nesting (FancyLists.Features etc.), although I'm personally leaning against it.

There's always the option of restricting customizability (Yes, I might have gotten carried away with the multitude of settings and features). That doesn't seem to work in the above case, however, as most scenarios won't require UpperLatin, ArabicIndic and Hebrew being always on.

The path of the least resistance (and my personal favorite) would be to leave things as are and refer users to documentation (once that's available).

dmitry-shechtman avatar Dec 23 '15 02:12 dmitry-shechtman

Unicode bullet is not in the spec (I haven't heard any good arguments against supporting it and I actually made a PR at one point that added it both to the spec and both reference implementations, the PR is still open).

I'm sorry to hear that your PR didn't get through.

There is indeed no reason why it shouldn't be added to the spec. However, I'm not sure having an always-on feature that isn't in the spec (yet?) is such a good idea.

FancyLists have 4 bullet list types (I used a different code point for disc), all off by default (although that in itself is subject to a separate discussion).

dmitry-shechtman avatar Dec 23 '15 02:12 dmitry-shechtman

HtmlFormatter was mainly created because people wanted to do things like <h1 class="header">foo</h1> with as little code as possible.

In the proposed design one would

  1. create a custom formatter that extends HeaderFormatter
  2. override WriteOpening()
  3. create a custom extension that initializes the formatter
  4. register the extension in settings.Extensions.

That might require a little more user code, but zero changes to the library code.

P.S. How much a performance gain did you get with the delegate array trick? I didn't apply to formatters yet, but I suspect that it would also be somewhat faster than a virtual method call for each element (especially inline).

dmitry-shechtman avatar Dec 23 '15 02:12 dmitry-shechtman

About the extension visibility - one idea I had - each extension would expose an extension method on the CommonMark namespace, so one could write something like:

using CommonMark;

var settings = CommonMarkSettings.Default.Clone();
settings.EnableFancyLists(some settings here as arguments);

This approach would enable other developers to write extensions that are external but for a developer those could be added the same way.

This would also enable you to separate the extension classes in separate namespaces for code clarity since the additional level would not add any extra work to the developer.

Knagis avatar Dec 23 '15 19:12 Knagis

P.S. How much a performance gain did you get with the delegate array trick? I didn't apply to formatters yet, but I suspect that it would also be somewhat faster than a virtual method call for each element (especially inline).

The original code was something like

if (c == '*' || c == '_') HandleEmphasis();
else if (c == '!') HandleExclamation();
etc...

The delegate array approach resulted in the same performance but with the added flexibility of changing the handle methods on the fly and without the need to hardcode settings checks in the parser method.

Knagis avatar Dec 23 '15 19:12 Knagis

However, I'm not sure having an always-on feature that isn't in the spec (yet?) is such a good idea.

It's not. I was always hoping that it would get to the spec sooner than later.

Knagis avatar Dec 23 '15 19:12 Knagis

HtmlFormatter

HtmlFormatter could go if the same can be achieved with different means. What I would like to keep is HtmlFormatterSlim since that produces the best performance.

What performance are you seeing with the extension model?

Knagis avatar Dec 23 '15 19:12 Knagis

Extension methods could be nice, but I see two problems with these:

  1. won't work in .NET 2.0
  2. would make the extension code less pretty.

I have found a workaround for 1, and 2 may be just my distorted point of view, but still.

dmitry-shechtman avatar Dec 23 '15 20:12 dmitry-shechtman

What I would like to keep is HtmlFormatterSlim since that produces the best performance.

I assumed as much. I'll then remove HtmlFormatter and rename HtmlFormatterSlim.

What performance are you seeing with the extension model?

I didn't measure yet, but I'm expecting that to be at least as good as that of the monolithic design (I applied the same trick to all parser methods, so instead of an if/else forest it's just a couple of delegate invocations - alright, more than a couple, but those should still perform much better than virtual calls.).

dmitry-shechtman avatar Dec 23 '15 21:12 dmitry-shechtman

The following exist in cmark:

  1. Safe (a.k.a. HTML ~~Sanitization~~Muting)

    Am I correct in my presuming that this wasn't ported yet?

  2. Smart (a.k.a Smart Punctuation)

    I am aware of the potential performance hit this might incur. Yet it's something that many users see as basic Markdown functionality (Smarty Pants has been around almost since the beginning).

  3. Tree (a.k.a. XML AST)

    A low priority one. Provided that Printer uses exactly the same tag names (does it?), this should be pretty straightforward.

dmitry-shechtman avatar Dec 23 '15 22:12 dmitry-shechtman

  1. won't work in .NET 2.0
  2. would make the extension code less pretty.

I don't think 1. is an issue - since most people even if they are forced to target .NET 2.0 will be using newer compilers that do support extension methods.

Why would it make it less pretty? In my mind the extension method was simply a shortcut for enabling the shortcut - if the dev wants he can still instantiate the extension and add it manually to the settings instance.

A similar approach is used with Owin in ASP.NET - the various frameworks add extension methods for IAppBuilder interface, so you end up simply calling app.MapSignalR().

PositionTracking

Perhaps, if that is indeed possible. Note #21 - if the implementation is changed drastically, this should be kept in mind - even if the implementation is not updated but at least to not make it harder to implement that down the road.

CustomFormatter

?

Safe (a.k.a. HTML Sanitization)

The implementation in cmark is not really HTML sanitization but rather disabling of HTML and unsafe URI schemes. I would prefer to use a separate external sanitization library that would actually process the HTML instead of prohibiting it. The problem is that the last time I looked, I could not find anything in the .NET world (MS had made one but deprecated it).

Smart

Yes, this one should be implemented. I once almost started but only almost.

Tree

It could be added and should be pretty easy to. Though I think I still prefer the plain text output in case I have to debug something. But there was at least one who was actually using that XML output in cmark for further processing.

Knagis avatar Dec 24 '15 11:12 Knagis

Why would it make it less pretty?

I was actually thinking about built-in extensions. Those would either need to open a separate namespace declaration, e.g.

namespace CommonMark
{
    public static partial class CommonMarkSettingsExtensions
    {
        public static void RegisterMathDollars(this CommonMarkSettings settings)
        {
            settings.Extensions.Register(new Extension.MathDollars(settings));
        }
    }
}

or a single file with a CommonMarkSettingsExtensions class would need to reference all built-in extensions at once (which is slightly prettier since I already have a RegisterAll() method that does the same).

I don't think 1. is an issue - since most people even if they are forced to target .NET 2.0 will be using newer compilers that do support extension methods.

This is a runtime issue rather than a compiler one. Trying to compile the above for .NET 2.0 results in

error CS1110: Cannot define a new extension method because the compiler required type 'System.Runtime.CompilerServices.ExtensionAttribute' cannot be found. Are you missing a reference to System.Core.dll?

As already mentioned, this is magically solved with a reference to LinqBridge. However, this begs the question: How many people actually use the .NET 2.0 version? Perhaps 2016 is the year it finally gets to retire?

dmitry-shechtman avatar Dec 24 '15 12:12 dmitry-shechtman

By CustomFormatter I meant OutputFormat.CustomDelegate. Sorry about the misnomer.

dmitry-shechtman avatar Dec 24 '15 12:12 dmitry-shechtman

The implementation in cmark is not really HTML sanitization but rather disabling of HTML and unsafe URI schemes. I would prefer to use a separate external sanitization library that would actually process the HTML instead of prohibiting it.

Yes, I noticed that after-the-factly. I think it's a matter of cmark compatibility. Surely CommonMark.NET is CommonMark-compliant, but it might look bad if it lacks an option some would consider a critical one. Maybe it should just ape cmark until a proper solution is found.

dmitry-shechtman avatar Dec 24 '15 12:12 dmitry-shechtman

The only reason I see to keep the 2.0 version is if someone somewhere wants to use it in an environment like SQL Server 2005 that allows to run managed code in its own process but only supports .NET 2.0. From what I read, we could add the attribute to the code the same as Lazy and Func.

There could be other reasons why for example JSON.NET still supports .NET 2.0.

OutputFormat.CustomDelegate

That is the current option (though ugly) to use custom formatters - the idea is that you could modify CommonMarkSettings.Default and thus change how another library that wraps commonmark.net is generating the output. It can go away as soon as there is a better way of specifying the formatter.

HTML sanitization

Yes, since there is no better solution for now, the simple disable-html flag could be the answer.

Knagis avatar Dec 24 '15 12:12 Knagis

From what I read, we could add the attribute to the code the same as Lazy and Func.

I tried that, it didn't work (although I might have done something wrong).

There could be other reasons why for example JSON.NET still supports .NET 2.0.

Sentimental value? :stuck_out_tongue:

Unlike JSON.NET, CommonMark.NET has the luxury of being in pre-release, which means it has much greater freedom in this regard.

Although I managed to work around missing .NET features so far, it feels like coding with one hand tied behind one's back. I recall at least ~~four~~five occurrences of adding/removing/adding/removing Func flavors.

dmitry-shechtman avatar Dec 24 '15 13:12 dmitry-shechtman

It can go away as soon as there is a better way of specifying the formatter.

Gone :sunglasses:

dmitry-shechtman avatar Dec 24 '15 13:12 dmitry-shechtman

Any chance you could add image size extensions?

https://github.com/jgm/CommonMark/wiki/Deployed-Extensions#image-size

Bryan-Legend avatar Aug 30 '16 18:08 Bryan-Legend

These are common requests I get in Markdown Edit for extensions:

  • Language specification in fenced code blocks for syntax highlighting

    ```C#
    
  • Generate header/image identifiiers for later styling

  • Checkboxes similar to GitHub

  • Tables

mike-ward avatar Sep 25 '16 12:09 mike-ward

Math formulas for Math Jax?

mdmoura avatar Nov 18 '16 18:11 mdmoura

TABLES

drusellers avatar Jan 20 '17 14:01 drusellers

  • [x] I'd love to have checkboxes!

If I was to implement such an extension myself, how should I go about doing it?

asbjornu avatar Sep 26 '18 10:09 asbjornu

Sorry for the late response. Overall my intention for a while now has been to update the readme to say that this package is deprecated and instead recommend to use https://github.com/lunet-io/markdig

The reason is that when I started work on porting cmark to .net I was under the impression that the standard will be constantly improved and things like tables will be added to it so that there would not be such a significant need for extensions. Unfortunately that did not turn out to be true, so the design of markdig that enables extensions is much more suitable for the current situation.

Knagis avatar Oct 08 '18 07:10 Knagis