TiddlyWiki5 icon indicating copy to clipboard operation
TiddlyWiki5 copied to clipboard

Revamp markdown plugin

Open cdruan opened this issue 2 years ago • 35 comments

Instead of merging markdown parse tree into TW tree, this PR feeds markdown-rendered HTML into TiddlyWiki for a final parse tree. This approach allows raw html inside the markdown text to be interpreted properly. To demo this PR or view test samples, visit new-markdown-plugin demo wiki; for comparison, here's the same wiki with the current markdown plugin

Other details

  • Replace remarkable with markdown-it parser for more robust parsing.

  • Extend syntax: strikethrough, insertion, mark, superscript, subscript, footnotes, definition lists.

  • Prevent accidental katex in expressions such as $5 and $6, <$list>...</$list>, $:/config%/PageControlButtons/Visibility/$:/core/ui/Buttons/home.

  • Ensure links to tiddlers work properly under markdown image and link syntax.

  • Add "syntax" section to user documentation.

  • Add support for "text/markdown" MIME type.

cdruan avatar Mar 13 '22 11:03 cdruan

I think most of the the "test tiddlers" from: https://cdruan.github.io/TiddlyWiki5 should go to the TW markdown edition.

pmario avatar Mar 13 '22 13:03 pmario

Replace remarkable with markdown-it parser for more robust parsing.

What did make you to switch to markdown-it from remarkable.

pmario avatar Mar 13 '22 13:03 pmario

As Jeremy pointed out the checklist plugin only adds visual "bells" without real functionality and it fails to print. see screenshot

There is a tw5-checklist plugin, ~~which I can't find at the moment~~, which actually allows users to change the state. It modifies the tiddler content.

grafik

pmario avatar Mar 14 '22 12:03 pmario

Hi @pmario and @Jermolene

Let me start by explaining why I made the switch to markdown-it. The decision was only made (reluctantly) after I found a series of problems with remarkable:

  • Remarkable does not handle raw <pre> and <style> blocks correctly when the blocks contain blank lines. This does not meet CommonMark's spec. See example #169:

    <pre language="haskell"><code>
    import Text.HTML.TagSoup
    
    main :: IO ()
    main = print $ parseTags tags
    </code></pre>
    okay
    

    The example rendered by remarkable:

    <pre language="haskell"><code>
    import Text.HTML.TagSoup
    <p>main :: IO ()
    main = print $ parseTags tags
    </code></pre>
    okay</p>
    

    Notice the extra <p> inside <pre> block and the spacing is no longer preserved.

  • In its rules_block/deflist.js module, the variable at line 98 should have been placed outside the loop. Checking with markdown-it-deflist confirms my finding. The consequence of that is the entire formatting of <dl> list is determined by the last term's spacing (tight or loose).

  • In its rules_block/htmlblock.js module, its HTML_TAG regular expressions (line 6 & 7) do not take into account of digits. So <h1>..<h6> are not recognized as block tags. So if you enter this:

    <h4>heading</h4><ul><li>item 1
    </li><li>item2</li></ul>
    

    Remarkable yields this:

    <p><h4>heading</h4><ul><li>item 1</p>
    </li><li>item2</li></ul>
    

    Not only is the <p> undesirable, its end tag cuts off the the <li> expression.

These problems altogether present a worrisome picture of a no longer well maintained project. Then, on markdown-it's github page:

markdown-it is the result of the decision of the authors who contributed to 99% of the Remarkable code to move to a project with the same authorship but new leadership (Vitaly and Alex). It's not a fork.

Having worked with both libraries, I can report that markdown-it has extra routines (such as Tokens and default token rendering) that make writing extensions less tedious.


As for the motivation of this PR, if you haven't done so already, please visit the demo using CURRENT markdown plugin and examine the tiddlers under Issues Fixed section. You'll see the issues I uncovered.

This PR loads successfully on both IE and ipad3—after I removed the let. There are some math expressions that could not be rendered in IE. On the other hand, the current plugin does not load successfully on either platforms. I suspect it may have to do with the remarkable-katex.js module not being es5 compliant.

cdruan avatar Mar 14 '22 23:03 cdruan

As for the motivation of this PR, if you haven't done so already, please visit the demo using CURRENT markdown plugin and examine the tiddlers under Issues Fixed section. You'll see the issues I uncovered.

I did see the fixes and I'm with you. There are definitely some problems with the remarkable library. Since I don't use markdown regularly, I don't have a problem to switch.

pmario avatar Mar 15 '22 00:03 pmario

checklist plugin removed

cdruan avatar Mar 15 '22 00:03 cdruan

One more thing, the immediate backward incompatibility I can see is that I removed the image parse rule from the renderWikiPragma. There was a debate whether prettylink should be included in the pragma or not. Since prettylink is still not in there and image rule provides the equivalent syntax for images, I took it out of the default setting... Or maybe both should be IN together?

cdruan avatar Mar 15 '22 01:03 cdruan

@cdruan ... Why did you close this one? ... I think it would be worth to be included.

pmario avatar Mar 30 '22 08:03 pmario

Hi @cdruan

@cdruan ... Why did you close this one? ... I think it would be worth to be included.

See https://github.com/Jermolene/TiddlyWiki5/pull/6528#discussion_r837977350

Jermolene avatar Mar 30 '22 19:03 Jermolene

@pmario and @Jermolene

I've rollbacked changes to parseutils.js. The entity decoding (of attribute values) is now performed on the TW parse-tree element and widget nodes—with the exception of <$wikify> and <$entity>—during the post-processing stage.

Other notable updates:

  • Add tw tag block parsing (similar to html blocks) to properly parse tw tags separated by blank lines.
  • Remove katex block parsing—I think inline display mode is sufficient. Block parsing only adds ability to include blank lines between $$...$$, but this feature is actually undesirable.
  • Add Extending Markdown demo tiddler to demonstrate the ability to extend the Markdown plugin.
  • Update/add few demo tiddlers.
  • Update user doc, mainly to specify inline math parsing rules.

Please let me know if you have any concerns. Thanks.

cdruan avatar Apr 06 '22 09:04 cdruan

Thanks @cdruan much appreciated.

I've rollbacked changes to parseutils.js. The entity decoding (of attribute values) is now performed on the TW parse-tree element and widget nodes—with the exception of <$wikify> and <$entity>—during the post-processing stage.

I'm sorry I didn't feedback quick enough on your earlier comments. The decoding of attributes must be applied consistently across all widgets. Arbitrary exceptions add significantly to the cognitive load of the system and future updates.

Is there another approach that doesn't affect wikitext parsing?

Jermolene avatar Apr 06 '22 10:04 Jermolene

Hi @Jermolene

Thanks for your feedback. I was hoping the disruption is deemed minimal because decoding is applied only to the string type of attribute values in markdown tiddlers. The likelihood of users issuing widget calls inside the markdown text is probably low.

Besides allowing consistent grammar across html elements and widgets, decoding the attributes allows me to pass LaTex and fence block markdown contents to TW via <$codeblock> and <$latex> widgets. (I can go into details on why I ultimately chose this strategy.) I also translate markdown links and images to <$link> and <$image> calls when appropriate and it'd be nice to have tooltips properly rendered when there are entities in the text. So I'm quite dependent on having a mechanism to decode entities in attribute string values.

I believe I found another widget, <$macrocall>, that requires delayed entity decoding... There are probably other exceptions. Would it be acceptable if the behavior is clearly documented? Alternatively, I could apply decoding only to the attributes of the aforementioned widgets (and html elements) that this PR depends on. Would that work? Personally, it would be nice to have a sure way of escaping quotes in the attribute values. Although delimiting with """ minimizes conflicts, theoretically, you can still have a string that contains """.

By the way, I just realized that current markdown plugin may be indirectly decoding entities in attribute values anyway. See example. For your reference, same markdown tiddlers rendered by this PR.

cdruan avatar Apr 07 '22 06:04 cdruan

@Jermolene ... This should also be considered for v5.2.3

pmario avatar Jul 08 '22 18:07 pmario

Thanks for your patience @cdruan

Jermolene avatar Jul 09 '22 07:07 Jermolene

Hi @cdruan - apologies for not noticing sooner – please could you resolve the merge conflict? Many thanks

Jermolene avatar Jul 09 '22 07:07 Jermolene

I've rollbacked changes to parseutils.js. The entity decoding (of attribute values) is now performed on the TW parse-tree element and widget nodes—with the exception of <$wikify> and <$entity>—during the post-processing stage.

I'm sorry I didn't feedback quick enough on your earlier comments. The decoding of attributes must be applied consistently across all widgets. Arbitrary exceptions add significantly to the cognitive load of the system and future updates.

Is there another approach that doesn't affect wikitext parsing?

@cdruan @Jermolene has this issue been resolved?

saqimtiaz avatar Jul 09 '22 09:07 saqimtiaz

--cdruan wrote-- I've rollbacked changes to parseutils.js. The entity decoding (of attribute values) is now performed on the TW parse-tree element and widget nodes—with the exception of <$wikify> and <$entity>—during the post-processing stage.

--Jermolene wrote-- I'm sorry I didn't feedback quick enough on your earlier comments. The decoding of attributes must be applied consistently across all widgets. Arbitrary exceptions add significantly to the cognitive load of the system and future updates.

Is there another approach that doesn't affect wikitext parsing?

--saqimtias wrote-- @cdruan @Jermolene has this issue been resolved?

I think there has been a misunderstanding in the first place. @cdruan did modify code in the parseutils.js which is part of the TW core. Mixing PRs in that way shouldn't happen. That's why the "offending" code was removed from this PR.

As far as I can see, this PR only contains code that is needed to handle markdown if the plugin is installed. It shouldn't interfere with the TW core code anymore.

There may still be a problem in parseutils.js, but this would be a different issue and a different PR.

pmario avatar Jul 18 '22 15:07 pmario

Sorry, I just saw the recent discussions.

I think we have yet to resolve the issue of decoding HTML entities inside widget and html tag's attribute string values. My current implementation is to decode entities inside attribute values of all html tags and widget tags, except <$wikify> and <$entity> and <$macrocall>. This only applies to markdown tiddlers.

However, I realized there's another approach if we could allow another string literal syntax, say: e"....string content...", similar to a raw string syntax in python, etc. Any strings of this form will have their entities decoded during parsing. This way, we could maintain backward compatibility while allowing content like this:

<$codeblock code=e"var s = &quot;foobar&quot;;"/>

And there won't be any seemingly arbitrary "exceptions" as before.

Would this new approach be acceptable?

cdruan avatar Jul 19 '22 09:07 cdruan

I'm not sure. Why is <$codeblock code="""var s = "foobar";;"""/> not good enough? It works out of the box.

pmario avatar Jul 19 '22 11:07 pmario

@pmario

Why is <$codeblock code="""var s = "foobar";;"""/> not good enough? It works out of the box.

This PR needs to be able to programmatically pass any content of a fenced code block to <$codeblock> widget. So it needs to assume that """ could be in the content.

cdruan avatar Jul 19 '22 12:07 cdruan

I could temporarily extend parseStringLiteral() to parse and decode entities inside e"...." strings and then restore the function to its original functionality once markdown parsing is done. (A version of this approach has been tested already.) So the extended string syntax will be an unpublished feature that only works within this markdown parser. This would be less drastic and no need to update TW5 doc.

cdruan avatar Jul 19 '22 12:07 cdruan

I've updated this PR with the approach described in this comment. I also made few additional updates to support the new "text/markdown" MIME type (commit 8e687eb).

The demo pages new vs. old have been updated as well, powered by 5.2.3-prerelease version.

cdruan avatar Jul 21 '22 11:07 cdruan

  • Extend syntax: strikethrough, insertion, mark, superscript, subscript, footnotes, definition lists.

Very appreciated update, thanks!

I see that a checklist plugin has been removed.

Is there any plan to support markdown checklists (TODOs), like here on GitHub?

- [ ] to be done
- [x] task is done

nicolnt avatar Jul 27 '22 08:07 nicolnt

Will the revamp also allow the plug-in to store meta file data in markdown front matter instead of needing two files per markdown tiddler?

kravlost avatar Jul 27 '22 08:07 kravlost

Will the revamp also allow the plug-in to store meta file data in markdown front matter instead of needing two files per markdown tiddler?

The markdown plugin handles how markdown tiddlers are rendered and displayed in TiddlyWiki, and is not related to loading/save files in the node.js configuration which is a core behaviour.

saqimtiaz avatar Jul 27 '22 08:07 saqimtiaz

Will the revamp also allow the plug-in to store meta file data in markdown front matter instead of needing two files per markdown tiddler?

The markdown plugin handles how markdown tiddlers are rendered and displayed in TiddlyWiki, and is not related to loading/save files in the node.js configuration which is a core behaviour.

#4702 says it would be possible with the new markdown plugin by adding the markdown-it-frontmatter plugin to the tw5 plugin. It would be great if this was supported.

kravlost avatar Jul 27 '22 08:07 kravlost

#4702 says it would be possible with the new markdown plugin by adding the markdown-it-frontmatter plugin to the tw5 plugin. It would be great if this was supported.

You are mistaken. As described in that comment, the markdown-it-frontmatter plugin can hide the frontmatter but this would have nothing to do with how the tiddlers are saved to disk and the .meta files created on node.js.

saqimtiaz avatar Jul 27 '22 09:07 saqimtiaz

#4702 says it would be possible with the new markdown plugin by adding the markdown-it-frontmatter plugin to the tw5 plugin. It would be great if this was supported.

You are mistaken. As described in that comment, the markdown-it-frontmatter plugin can hide the frontmatter but this would have nothing to do with how the tiddlers are saved to disk and the .meta files created on node.js.

Ah, OK. My mistake.

kravlost avatar Jul 27 '22 09:07 kravlost

After playing around with markdown tiddlers, I can tell that this PR fixes #6916 and #6919. Are there any blockers?

MaxGyver83 avatar Aug 25 '22 19:08 MaxGyver83

Thanks @MaxGyver83.

@cdruan is this PR ready for another review?

Jermolene avatar Aug 26 '22 14:08 Jermolene