smu icon indicating copy to clipboard operation
smu copied to clipboard

MathJax support and bug fix

Open clark800 opened this issue 1 year ago • 4 comments

This prevents processing of text inside $ and $$ so that MathJax will work. Without this, for example, a math formula with underscores would get broken by <em> insertions.

For single dollar signs, since documents often contain single dollar signs for other reasons, we only change the behavior if the span does not start or end with whitespace and does not contain a newline. In the unlikely event that a document unintentionally matches this pattern, it is even more unlikely that it will cause a real problem because it just prevents processing but doesn't make any changes. Dollar signs inside code blocks will not be affected.

This also fixes a bug that caused "surround" characters to generate tags even when the start and stop tokens were at the same location.

clark800 avatar Oct 29 '23 01:10 clark800

Thanks for the PR! The code seems to work fine.

Things that let me hesitate instead of just pushing the merge button:

  • The syntax is not part of CommonMark and the ecosystem did not settle on one obvious syntax (looking at https://github.com/cben/mathdown/wiki/math-in-markdown)
  • Is the heuristic good enough not to break existing documents? This is hard to judge, unfortuantely.
  • Do we need both syntaxes? Using only $$ could further reduce false positives.

karlb avatar Nov 04 '23 14:11 karlb

Given that smu is supposed to be simple and minimal - missing even some commonmark features - a nonstandard feature like this seems out of place to me. It is probably better kept in personal builds.

Or at the very least, it shouldn't be enabled by default. md4c for example has nonstandard extensions but they are disabled by default and needs to be enabled via cli flag.

N-R-K avatar Nov 04 '23 14:11 N-R-K

Yeah, I noticed that there are a variety of conventions within markdown, which was a bit surprising, especially for example with a case like md4c which outputs custom HTML tags that don't seem to be compatible with MathJax.

I think the choice of syntax is fairly easy though because it's most natural to not introduce a new syntax for writing LaTeX when you have the option of using the LaTeX syntax. LaTeX inline math delimiters are $ ... $ and \( ... \) and display math delimiters are $$ ... $$ and \[ ... \] (https://www.overleaf.com/learn/latex/Mathematical_expressions). Of these, the ones with backslashes interfere with escaping in markdown so the dollar signs are the only choice that doesn't introduce a new syntax and doesn't interfere with escaping in markdown.

MathJax looks for these LaTeX delimiters, so it would only make sense to translate from another set of delimiters if there was a serious concern about breaking documents, but I think the way it is setup here makes this issue fairly negligible, as explained in my last comment.

GitHub Flavored Markdown uses $/$$ syntax so it's fairly standard (https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/writing-mathematical-expressions). It's also used in popular markdown renderers like Pandoc (https://pandoc.org/chunkedhtml-demo/8.13-math.html).

It is definitely possible that this change could affect the output for existing documents, but only for rare cases like:

I paid $100 and *they* paid -$100.

The difference in the output would just be whether the asterisks get rendered as italics. In general, the difference would be whether inline markdown commands get rendered between spans of dollar signs on a single line where there is no whitespace on the inner sides of the dollar signs, which is unusual on the left side of a dollar sign unless it's a negative value.

Though I would technically consider this a breaking change, I think this is a sufficiently rare edge case to worry about much. I did consider whether there should be a flag for this and I felt that it would be unnecessary complexity.

Pandoc uses the same rule except it additionally checks that the second dollar sign is not followed by a digit, which would also exclude this case. I slightly prefer the slightly simpler and more symmetric rule in this PR, but if you want to follow the Pandoc convention that would be an easy change.

I think both $ and $$ are important because $ means inline math and $$ mean display math. Using additional conventions to combine them would increase complexity or reduce flexibility.

clark800 avatar Nov 04 '23 17:11 clark800

I updated the PR so that it will have no impact on the default build and math support is only enabled when built with make math.

This update also makes it possible to configure the math delimiters at build time so it's easy to customize by setting the CPPFLAGS environment variable when running make.

I also decided to use the \[ \] and \( \) delimiters in the output since we don't have to be so careful about causing changes anymore, and this will make it so inline math works without having to change the default MathJax configuration.

clark800 avatar Nov 08 '23 22:11 clark800

I deleted my fork so this was automatically closed. I wrote a new markdown renderer called smd which is even simpler and has some extra features including mathjax support: https://github.com/clark800/smd

clark800 avatar Sep 14 '24 20:09 clark800

Interesting. I wonder if I should like to some alternatives with a few words about the main differences, so that everyone can pick the best smu-like for project and personal taste. Or maybe we can have a shared overview somewhere that can be linked from all the projects. It must be difficult to pick one for newcomers. Or is there already something like it?

karlb avatar Sep 25 '24 15:09 karlb

The smd readme lists most of the features and simplifications of smd, so that could be compared with the smu readme.

At a high level, smd is smaller (538 lines vs 707), faster, cleaner code, and has a few extra features on net, but not quite a superset of smu. However it is also more strict and a bit less standards compliant. For example, inline spans (e.g. italic text) must open and close on the same line. This is because by design it never uses malloc and processes the input one or two lines at a time. The objective is to have a markdown flavor that is more focused on simplicity and speed without being concerned about compatibility, though you can still write markdown for it that is standards compatible if you follow a few rules.

clark800 avatar Sep 25 '24 21:09 clark800