prosemirror-math
prosemirror-math copied to clipboard
Paste from External Sources (MathJax in the wild, Wikipedia, etc.)
It would be great to modify the default paste behavior to automatically detect math markup in HTML pasted from external sources. Unfortunately, the solution will be messy, as there is not yet a universally-accepted way to render math on the web.
See Robert Miner 2010, "MathType, Math Markup, and the Goal of Cut and Paste" for a brief summary of the challenges faced in this area. Here's an except from one of the slides:
Math on the Web formats in the Wild
- Image with TeX code (alt tags, comments, urls)
- Some content is in text (HTML math, TeX source, ASCII art)
- Some is in the DOM (MathML, s and CSS)
The following tasks are relatively low-effort and high-reward:
- support pasting MathML expressions when the source TeX code is included as an annotation (this seems to be a standard feature in some mathjax configurations)
- support pasting inline math images from Wikipedia when there is an
alt
tag present
Some higher-effort tasks:
- Parse MathML expressions directly (see mml2tex and mathml2latex, and this answer).
- The JavaScript XSLT bindings may be helpful, but I'm not sure whether the API is well-supported across browsers.
- There is an old tool called pmml2tex
Things to be cautious of:
- MathJax and KaTeX both include the same math expression multiple times in the same block, e.g. rendered as MathML and SVG simultaneously for compatibility reasons. We need to identify the common parent element and ensure that it is replaced by a single math expression, rather than two or three.
- Pasting behavior between different browsers
Here are some places we might expect users to paste from:
- Wikipedia: Extremely inconsistent -- pages have a mix of MathJax, HTML math, and pre-rendered images
-
StackExchange: Uses MathJax. The source code is evidently stored in a
<script type="math/tex; mode=display">
tag within a.math-container
-classed element. -
ncatlab: Uses MathJax. Source is stored in a
<annotation encoding="application/x-tex">
tag. -
Planet Math: Uses MathJax, with some weird layouts. Display math is sometimes wrapped in a
<table class="ltx_equation ltx_eqn_table">
element. The MathML node has an analttext
attribute containing the TeX source. -
arXiv: (example) Uses MathJax with the source stored in a
<script type="math/tex">
tag. -
ProofWiki: uses MathJax with source in a
<script type="math/tex">
tag - Google Docs: ???
- Microsoft Word: ???
I started to implement pasting of math from Wikipedia using a custom ProseMirror ParseRule
(and the .getContent
property), but ran into some unexpected behavior where the pasted math nodes all come up empty. I started a question on the ProseMirror forum which will hopefully resolve the issue.
This website has math rendered using Madoko, which renders math and diagram SVGs server-side and includes them the following format:
<svg class="snippet math-display math-render-svg math" data-math-full="true" style="..." viewBox="...">
<desc>\begin{tikzpicture}
\matrix[nodes={draw}, row sep=0.3cm,column sep=0.5cm] {
\node [rectangle, draw=none] (eq) {$a = b, b = c, d = e, b = s, d = t: $};&
\node [circle, draw] (abcs) {$a, b, c, s$}; &
\node [circle, draw] (det) {$d, e, t$}; \\
};
\end{tikzpicture}
</desc>
<g id="math-a6e187">...</g>
</svg>
This example contains an SVG rendering of a tikz diagram, which is obviously problematic for KaTeX, which is the current default. Once MathJax is supported, an extension like TikzJax can be used to render diagrams.
UPDATE: It won't be possible to paste from documents rendered with Madoko. The TeX source is contained in a <desc>
tag within an SVG element, and apparently the <desc>
tags are stripped away in both Chrome and Firefox when copying.
UPDATE: StackExchange keeps its TeX code in <script type="math/tex">
tags, but these are stripped away when copying for security reasons. To copy from StackExchange, we'll need to parse the MathML directly.