TiddlyWiki5
TiddlyWiki5 copied to clipboard
Revamp markdown plugin
Instead of merging markdown parse tree into TW tree, this PR feeds markdown-rendered HTML into TiddlyWiki for a final parse tree. This approach allows raw html inside the markdown text to be interpreted properly. To demo this PR or view test samples, visit new-markdown-plugin demo wiki; for comparison, here's the same wiki with the current markdown plugin
Other details
-
Replace
remarkable
withmarkdown-it
parser for more robust parsing. -
Extend syntax: strikethrough, insertion, mark, superscript, subscript, footnotes, definition lists.
-
Prevent accidental katex in expressions such as
$5 and $6
,<$list>...</$list>
,$:/config%/PageControlButtons/Visibility/$:/core/ui/Buttons/home
. -
Ensure links to tiddlers work properly under markdown image and link syntax.
-
Add "syntax" section to user documentation.
-
Add support for "text/markdown" MIME type.
I think most of the the "test tiddlers" from: https://cdruan.github.io/TiddlyWiki5 should go to the TW markdown edition.
Replace remarkable with markdown-it parser for more robust parsing.
What did make you to switch to markdown-it from remarkable.
As Jeremy pointed out the checklist plugin only adds visual "bells" without real functionality and it fails to print. see screenshot
There is a tw5-checklist plugin, ~~which I can't find at the moment~~, which actually allows users to change the state. It modifies the tiddler content.
Hi @pmario and @Jermolene
Let me start by explaining why I made the switch to markdown-it. The decision was only made (reluctantly) after I found a series of problems with remarkable:
-
Remarkable does not handle raw
<pre>
and<style>
blocks correctly when the blocks contain blank lines. This does not meet CommonMark's spec. See example #169:<pre language="haskell"><code> import Text.HTML.TagSoup main :: IO () main = print $ parseTags tags </code></pre> okay
The example rendered by remarkable:
<pre language="haskell"><code> import Text.HTML.TagSoup <p>main :: IO () main = print $ parseTags tags </code></pre> okay</p>
Notice the extra
<p>
inside<pre>
block and the spacing is no longer preserved. -
In its
rules_block/deflist.js
module, the variable at line 98 should have been placed outside the loop. Checking with markdown-it-deflist confirms my finding. The consequence of that is the entire formatting of<dl>
list is determined by the last term's spacing (tight or loose). -
In its
rules_block/htmlblock.js
module, its HTML_TAG regular expressions (line 6 & 7) do not take into account of digits. So<h1>
..<h6>
are not recognized as block tags. So if you enter this:<h4>heading</h4><ul><li>item 1 </li><li>item2</li></ul>
Remarkable yields this:
<p><h4>heading</h4><ul><li>item 1</p> </li><li>item2</li></ul>
Not only is the
<p>
undesirable, its end tag cuts off the the<li>
expression.
These problems altogether present a worrisome picture of a no longer well maintained project. Then, on markdown-it's github page:
markdown-it is the result of the decision of the authors who contributed to 99% of the Remarkable code to move to a project with the same authorship but new leadership (Vitaly and Alex). It's not a fork.
Having worked with both libraries, I can report that markdown-it has extra routines (such as Tokens and default token rendering) that make writing extensions less tedious.
As for the motivation of this PR, if you haven't done so already, please visit the demo using CURRENT markdown plugin and examine the tiddlers under Issues Fixed section. You'll see the issues I uncovered.
This PR loads successfully on both IE and ipad3—after I removed the let
. There are some math expressions that could not be rendered in IE. On the other hand, the current plugin does not load successfully on either platforms. I suspect it may have to do with the remarkable-katex.js
module not being es5 compliant.
As for the motivation of this PR, if you haven't done so already, please visit the demo using CURRENT markdown plugin and examine the tiddlers under Issues Fixed section. You'll see the issues I uncovered.
I did see the fixes and I'm with you. There are definitely some problems with the remarkable library. Since I don't use markdown regularly, I don't have a problem to switch.
checklist plugin removed
One more thing, the immediate backward incompatibility I can see is that I removed the image
parse rule from the renderWikiPragma
. There was a debate whether prettylink
should be included in the pragma or not. Since prettylink
is still not in there and image
rule provides the equivalent syntax for images, I took it out of the default setting... Or maybe both should be IN together?
@cdruan ... Why did you close this one? ... I think it would be worth to be included.
Hi @cdruan
@cdruan ... Why did you close this one? ... I think it would be worth to be included.
See https://github.com/Jermolene/TiddlyWiki5/pull/6528#discussion_r837977350
@pmario and @Jermolene
I've rollbacked changes to parseutils.js
. The entity decoding (of attribute values) is now performed on the TW parse-tree element and widget nodes—with the exception of <$wikify>
and <$entity>
—during the post-processing stage.
Other notable updates:
- Add tw tag block parsing (similar to html blocks) to properly parse tw tags separated by blank lines.
- Remove katex block parsing—I think inline display mode is sufficient. Block parsing only adds ability to include blank lines between
$$...$$
, but this feature is actually undesirable. - Add Extending Markdown demo tiddler to demonstrate the ability to extend the Markdown plugin.
- Update/add few demo tiddlers.
- Update user doc, mainly to specify inline math parsing rules.
Please let me know if you have any concerns. Thanks.
Thanks @cdruan much appreciated.
I've rollbacked changes to
parseutils.js
. The entity decoding (of attribute values) is now performed on the TW parse-tree element and widget nodes—with the exception of<$wikify>
and<$entity>
—during the post-processing stage.
I'm sorry I didn't feedback quick enough on your earlier comments. The decoding of attributes must be applied consistently across all widgets. Arbitrary exceptions add significantly to the cognitive load of the system and future updates.
Is there another approach that doesn't affect wikitext parsing?
Hi @Jermolene
Thanks for your feedback. I was hoping the disruption is deemed minimal because decoding is applied only to the string type of attribute values in markdown tiddlers. The likelihood of users issuing widget calls inside the markdown text is probably low.
Besides allowing consistent grammar across html elements and widgets, decoding the attributes allows me to pass LaTex and fence block markdown contents to TW via <$codeblock>
and <$latex>
widgets. (I can go into details on why I ultimately chose this strategy.) I also translate markdown links and images to <$link>
and <$image>
calls when appropriate and it'd be nice to have tooltips properly rendered when there are entities in the text. So I'm quite dependent on having a mechanism to decode entities in attribute string values.
I believe I found another widget, <$macrocall>
, that requires delayed entity decoding... There are probably other exceptions. Would it be acceptable if the behavior is clearly documented? Alternatively, I could apply decoding only to the attributes of the aforementioned widgets (and html elements) that this PR depends on. Would that work? Personally, it would be nice to have a sure way of escaping quotes in the attribute values. Although delimiting with """
minimizes conflicts, theoretically, you can still have a string that contains """
.
By the way, I just realized that current markdown plugin may be indirectly decoding entities in attribute values anyway. See example. For your reference, same markdown tiddlers rendered by this PR.
@Jermolene ... This should also be considered for v5.2.3
Thanks for your patience @cdruan
Hi @cdruan - apologies for not noticing sooner – please could you resolve the merge conflict? Many thanks
I've rollbacked changes to
parseutils.js
. The entity decoding (of attribute values) is now performed on the TW parse-tree element and widget nodes—with the exception of<$wikify>
and<$entity>
—during the post-processing stage.I'm sorry I didn't feedback quick enough on your earlier comments. The decoding of attributes must be applied consistently across all widgets. Arbitrary exceptions add significantly to the cognitive load of the system and future updates.
Is there another approach that doesn't affect wikitext parsing?
@cdruan @Jermolene has this issue been resolved?
--cdruan wrote-- I've rollbacked changes to parseutils.js. The entity decoding (of attribute values) is now performed on the TW parse-tree element and widget nodes—with the exception of <$wikify> and <$entity>—during the post-processing stage.
--Jermolene wrote-- I'm sorry I didn't feedback quick enough on your earlier comments. The decoding of attributes must be applied consistently across all widgets. Arbitrary exceptions add significantly to the cognitive load of the system and future updates.
Is there another approach that doesn't affect wikitext parsing?
--saqimtias wrote-- @cdruan @Jermolene has this issue been resolved?
I think there has been a misunderstanding in the first place. @cdruan did modify code in the parseutils.js
which is part of the TW core. Mixing PRs in that way shouldn't happen. That's why the "offending" code was removed from this PR.
As far as I can see, this PR only contains code that is needed to handle markdown if the plugin is installed. It shouldn't interfere with the TW core code anymore.
There may still be a problem in parseutils.js
, but this would be a different issue and a different PR.
Sorry, I just saw the recent discussions.
I think we have yet to resolve the issue of decoding HTML entities inside widget and html tag's attribute string values. My current implementation is to decode entities inside attribute values of all html tags and widget tags, except <$wikify>
and <$entity>
and <$macrocall>
. This only applies to markdown tiddlers.
However, I realized there's another approach if we could allow another string literal syntax, say: e"....string content..."
, similar to a raw string syntax in python, etc. Any strings of this form will have their entities decoded during parsing. This way, we could maintain backward compatibility while allowing content like this:
<$codeblock code=e"var s = "foobar";"/>
And there won't be any seemingly arbitrary "exceptions" as before.
Would this new approach be acceptable?
I'm not sure. Why is <$codeblock code="""var s = "foobar";;"""/>
not good enough? It works out of the box.
@pmario
Why is <$codeblock code="""var s = "foobar";;"""/> not good enough? It works out of the box.
This PR needs to be able to programmatically pass any content of a fenced code block to <$codeblock>
widget. So it needs to assume that """
could be in the content.
I could temporarily extend parseStringLiteral()
to parse and decode entities inside e"...."
strings and then restore the function to its original functionality once markdown parsing is done. (A version of this approach has been tested already.) So the extended string syntax will be an unpublished feature that only works within this markdown parser. This would be less drastic and no need to update TW5 doc.
I've updated this PR with the approach described in this comment. I also made few additional updates to support the new "text/markdown" MIME type (commit 8e687eb).
The demo pages new vs. old have been updated as well, powered by 5.2.3-prerelease version.
- Extend syntax: strikethrough, insertion, mark, superscript, subscript, footnotes, definition lists.
Very appreciated update, thanks!
I see that a checklist plugin has been removed.
Is there any plan to support markdown checklists (TODOs), like here on GitHub?
- [ ] to be done
- [x] task is done
Will the revamp also allow the plug-in to store meta file data in markdown front matter instead of needing two files per markdown tiddler?
Will the revamp also allow the plug-in to store meta file data in markdown front matter instead of needing two files per markdown tiddler?
The markdown plugin handles how markdown tiddlers are rendered and displayed in TiddlyWiki, and is not related to loading/save files in the node.js configuration which is a core behaviour.
Will the revamp also allow the plug-in to store meta file data in markdown front matter instead of needing two files per markdown tiddler?
The markdown plugin handles how markdown tiddlers are rendered and displayed in TiddlyWiki, and is not related to loading/save files in the node.js configuration which is a core behaviour.
#4702 says it would be possible with the new markdown plugin by adding the markdown-it-frontmatter plugin to the tw5 plugin. It would be great if this was supported.
#4702 says it would be possible with the new markdown plugin by adding the markdown-it-frontmatter plugin to the tw5 plugin. It would be great if this was supported.
You are mistaken. As described in that comment, the markdown-it-frontmatter plugin can hide the frontmatter but this would have nothing to do with how the tiddlers are saved to disk and the .meta files created on node.js.
#4702 says it would be possible with the new markdown plugin by adding the markdown-it-frontmatter plugin to the tw5 plugin. It would be great if this was supported.
You are mistaken. As described in that comment, the markdown-it-frontmatter plugin can hide the frontmatter but this would have nothing to do with how the tiddlers are saved to disk and the .meta files created on node.js.
Ah, OK. My mistake.
After playing around with markdown tiddlers, I can tell that this PR fixes #6916 and #6919. Are there any blockers?
Thanks @MaxGyver83.
@cdruan is this PR ready for another review?