markdig icon indicating copy to clipboard operation
markdig copied to clipboard

Is there a way to render a summary by short circuiting or truncating the rendering?

Open haacked opened this issue 5 years ago • 3 comments

Is there an extension to render a summary of markdown text by truncating the content after a set number of words?

For example, suppose I'm building a site where people write articles in markdown. But on the home page, I want to display the first 300 words of each article. I'd like to avoid rendering the whole thing and then parse the HTML to find the first 300 words (while properly closing any open tags). It'd be nice if there's a way I could do it as MarkDig is parsing (or rendering) the markdoown.

If no such thing exists, I could try to implement this myself, but would appreciate some hints as to how I'd go about it.

haacked avatar Nov 04 '19 19:11 haacked

By far the simplest way is for you to find the first 300 words and pass that substring to Markdig. The downside is that if links are defined after that (as is common with Markdown where they are at the bottom), those wouldn't work. This could also break headings, emphasis text...

A much more correct approach, if the first one isn't suitable for you, is to edit the syntax tree to remove paragraphs. This is a better approach IMO. Ping me if you wanna go this route.

MihaZupan avatar Nov 04 '19 19:11 MihaZupan

A much more correct approach, if the first one isn't suitable for you, is to edit the syntax tree to remove paragraphs. This is a better approach IMO. Ping me if you wanna go this route.

I'm looking for a correct approach. This sounds like a good option.

haacked avatar Nov 04 '19 20:11 haacked

I wrote a sample trimming implementation. Right now it will try to keep as many elements as will fit into the limit and then start discarding elements.

You can change how new lines affect the limit. The current implementation will be limiting the output based on character count, to change it to word count, you only have to change the TrimSpan implementation.

I did a bit of testing, but there are probably some edge-cases I didn't test that won't count towards the limit properly.

// pipeline setup

MarkdownDocument document = Markdown.Parse(markdown, pipeline);

TrimDocument(document, numberOfCharactersToKeep: 200);

// rendering to html

There are some implementation details such as whether you want to cut a link in the middle. If not, change this to charactersAvailable -= autoLink.Url.Length.

MihaZupan avatar Nov 05 '19 15:11 MihaZupan