asciidoctor-latex
asciidoctor-latex copied to clipboard
On Reserved Characters and the Two-Universe Problem
One of the thorny issues with the LaTeX backend for Asciidoctor is what to do
with characters such as $
, &
, _
and a few others. The first plays a spacial role
in the LaTeX universe as a delimiter for mathematical expressions, e.g.
$(a^2)^3 = 1$. In the universe we normally inhabit, it is for US dollars. (Ahem,
and bash?)
The problem is that these two universes are not entirely separate. I've been trying to handle the conflicts with various hacks. Now, however, I have a large set of documents "from the wild" on which to apply the converter, and realized that it is probably best to enforce a separation:
- Asciidoc documents will treat symbols like $, &, _ as normal people do. That means that they will be escaped by Asciidotor before being sent into the gullet of the Asciidoctor-LaTeX converter. The result should be reliable conversoin to LaTeX for those who want it. It is also a route to getting PDF. That said, Asciidoctor's HTML format is more pleasing to the eye.
- Documents written in Asciidoctor-LaTeX will treat symbols like $,&, _ according to the rules of the LaTeX universe. This means that the user must escape these symbols if they are to have their conventional meaning. Mathematicians and other LaTeX users are used to doing this, so this should not cause any sociological problems.
- One has to be liberal about interpreting 1) vs (2). Asciidoctor-LaTeX's [env.FOO] feature can be quite handy for authors that never write an equation. The main point of distinction is the presence of mathematical text.
Noteshare can do a fairly good job of auto-detection of the universe in which the author is operating, but i think the default is just to have an attribute which decides the issue. Thus a law office using Asciidoctor-LaTeX would not set that attribute and lawyerly documents will convert as expected, and ditto for, say, docs with code but no math. We leave the burden of setting the attribute on those who need it the most.
The name of the attribute should not be :latex: since that would be confusing -- the user is already using Ascidoctor-LaTeX.
@mojavelinux, @jirutka , all, I would appreciate your comments before proceedidng.
i think the default is just to have an attribute which decides the issue.
I think you are absolutely right. After the issue of native LaTeX markup was brought up again by @edusantana in https://github.com/asciidoc/asciidoc/pull/72#issuecomment-102643894, it got me thinking that this needs to be a different parsing mode...and one that is explicit.
We need to think about the right name for the setting, but imagine we have something like:
:infix-latexmath:
Then, we match raw LaTeX expressions in the document...and pass through the content so that it doesn't get interpreted as AsciiDoc. In other words, latexmath:[(a^2)^3 = 1]
becomes exactly equivalent to $(a^2)^3 = 1$
and any use of dollar must be escaped if not a LaTeX delimiter.
We can even toggle this attribute throughout the document so that you can do some parts with infix LaTex and some parts in the normal AsciiDoc way.
I'd prefer to make the author always define this attribute, even when using Asciidoctor LaTeX. It's an important semantic piece of information that tells a tool how to process the document (for instance, the Atom editor). Without this information, the processor just don't know what the author intends.
We could even shorten it to something like if an extra attribute is too much of a burden.
:stem: infix-latexmath
Btw, when we enable infix-latexmath
, it's essential that we treat the contents of infix LaTeX with passthrough semantics. Otherwise, the expression can get manged by the AsciiDoc parser (perhaps you already do this).
I'm open to infix-latex
as well :) We don't need the "math" part.
I do have one question about delimiters and I'm completely open to the response. You probably noticed that I changed the inline delimiter for LaTeX that we pass to MathJax from $
to \(
. How do you feel about requiring inline expressions to be written as \((a^2)^3 = 1\)
instead of $(a^2)^3 = 1$
?
I'll mention the reason I made this change is two-fold. First, $
is a very typical character to find in content, almost always to mean US dollar. Second, \(
and \)
are more consistent with the block / display LaTeX delimiter of \[
and \]
.
I ask because the risk is that $
is so embedded in the LaTeX community that it could be unnatural to change now. However, this is also an opportunity to move forward. wdyt?
We can even toggle this attribute throughout the document so that you can do some parts with infix LaTex and some parts in the normal AsciiDoc way. I'd prefer to make the author always define this attribute, even when using Asciidoctor LaTeX. It's an important semantic piece of information that tells a tool how to process the document (for instance, the Atom editor). Without this information, the processor just don't know what the author intends.
Will atom be able to detect these changes to process the document?
About the attribute name:
:parse-mode: asciidoc|latex
Or perhaps:
:parse-mode: asciidoc+latex
or even
:parse-flags: +latex
Will atom be able to detect these changes to process the document?
One step at a time. Let's call it a goal. We'll get there.
(1)
I think that passing ( .. ) to MathJax is the right thing to do.
In Asciidoctor-LaTeX, the preprerpocessor already maps $ … $ to ( … ) as best it can (see the regexes below). If the backend is HTML, there is nothing further to do. If it is LaTeX, it maps ( … ) back to $ … $, since that is what most mathematicians expect.
IMPORTANT: If you have suggestions on the regex’s, that would be great — there some sticky issues.
(2)
LaTeX accepts ( .. ) and [ .. ] as well as $ .. $ and $$ … $$. Mathematicians fall into three categories:
(a) Dinosaurs — use $ .. $ and $$ … $$ — still common, even in the younger set (b) Hybrids —use $ … $ and [ … ] — very common. I am among them (red face) (c) Moderns — use ( .. ) and [ … ]
Anyway, we should still pass ( .. ) to MathJax.
REGEX:
TEX_DOLLAR_RX = /\$(.*?)\$/
TEX_DOLLAR_SUB = '\\\(\1\\\)'
TEX_DOLLAR_SUB2 = '+\\\(\1\\\)+'
if line.include? '$'
line = line.gsub TEX_DOLLAR_RX, TEX_DOLLAR_SUB2
end
The plus signs in TEX_DOLLAR_SUB2
are there to mitigate bad substitutions
e.g.
\( (a^2)^3 = 1 \)
gets messed up because of the two ^’s.
I also have to protect the interior of \[ … \]
with the below
# protect math, e.g., (a^2)^3 from Asciidoc subsitutions:
if line =~ /^\\\[/
line = line.gsub /^\\\[/, '+\\['
end
if line =~ /^\\\]/
line = line.gsub /^\\\]/, '\\]+'
end
Jim
http://noteshare.io http://home.noteshareblog.io
On May 19, 2015, at 7:53 PM, Dan Allen [email protected] wrote:
I do have one question about delimiters and I'm completely open to the response. You probably noticed that I changed the inline delimiter for LaTeX that we pass to MathJax from $ to (. How do you feel about requiring inline expressions to be written as ((a^2)^3 = 1) instead of $(a^2)^3 = 1$?
I'll mention the reason I made this change is two-fold. First, $ is a very typical character to find in content, almost always to mean US dollar. Second, ( and ) are more consistent with the block / display LaTeX delimiter of [ and ].
I ask because the risk is that $ is so embedded in the LaTeX community that it could be unnatural to change now. However, this is also an opportunity to move forward. wdyt?
— Reply to this email directly or view it on GitHub.
Addendum — perhaps we can have a preprocessor swtich for ( .. ) vs $ … $. The $ .. $ is indeed deeply embedded. The ( .. ) is better syntactly — much better — but involves much more hand motion — not just twice the number of characters, but more motion over the keyboard and some of it more awkward you have to use the shift key twice, and also get your pinky on the damn backslash. But inline math occurs with such high frequency that it is a big deal.
That said — i don’t think we should have too many switches.
On May 19, 2015, at 7:53 PM, Dan Allen [email protected] wrote:
I do have one question about delimiters and I'm completely open to the response. You probably noticed that I changed the inline delimiter for LaTeX that we pass to MathJax from $ to (. How do you feel about requiring inline expressions to be written as ((a^2)^3 = 1) instead of $(a^2)^3 = 1$?
I'll mention the reason I made this change is two-fold. First, $ is a very typical character to find in content, almost always to mean US dollar. Second, ( and ) are more consistent with the block / display LaTeX delimiter of [ and ].
I ask because the risk is that $ is so embedded in the LaTeX community that it could be unnatural to change now. However, this is also an opportunity to move forward. wdyt?
— Reply to this email directly or view it on GitHub https://github.com/asciidoctor/asciidoctor-latex/issues/33#issuecomment-103699738.
If I’ve understood correctly, I think this is the right path; Sorry for the length of this. It should be better readable as Asciidoc
. Should not all three of latex math:[(a^2)^3 = 1]
, $(a^2)^3 = 1$
and \( (a^2)^3 = 1 \)
be treated as pass-throughs and sent to MathJax as in the third option?
How do you reliably recognize $(a^2)^3 = 1$
- type expressions. (I guess
the answer is that the author needs to write syntactically correct text and esape
non-math uses of $ — then the regexes can do their work.
. Sticky point. Via noteshare I get Asciidoctor-LaTeX from the wild. Some
folks write $ .. $ on two lines, or maybe something happens and a line break
is introduced. If Asciidoctor only does line-by-line parsing, this may be unsolvable
until there is a new parser. (Is @jirutka working on this? I seem to have
seen something intriguiing.
. The contents of \[ … \]
also need to be passed through. I have some test
cases that forced me to put +’s on either side. Here is an example:
. The contents of certain env-blocks need to passed through as well. There is
hort, defined list of these: (1) [env.equation]
, (2) [env.equationalign]
,
(3) [env.jsxgraph]
, (4) [env.chem]
, maybe (5) [env.code]
. The list could grow, but
it will always be short. (3) is for interactive apps like the mass-spring demo.
There should probably also be [env.javascrpt]
(3) is javascript plus some
libraries that need to be called. (4) invokes the mhchem LaTeX package
and makes it possible for chemists to write their stuff in a way that works better
for them than straight LaTeX (5) is an automatically numbered code enivironment
that I have been using for some writing.
Here is an example of a special env block that is problematic — It does not render properly because of the ^^ substtiutions
//.tex_pathologies6
[env.equationalign]
--
\omega(s) = x^{-1/2}(x-1)^{-1/2}(x-s)^{-1/2} dx \\
\omega'(s) = \frac{1}{2} x^{-1/2}(x-1)^{-1/2}(x-s)^{-3/2} dx \\
\omega'(s) = \frac{3}{4} x^{-1/2}(x-1)^{-1/2}(x-s)^{-5/2} dx
—
Here is another:
$f'(x) = s(s-1)+ O((x-s))$
The double parens disapear
Recall that all other [env.FOO] constructs are “normal” in that their block contents should be processed by Asciidoctor / LaTeX . This doesn’t happen — try this test file:
[env.theorem]
We require that
- $ a + b = b + a$
-
$ a + (b + c) = (a + b) + c$. That's all folks!
All this said, I am able to render qute a bit of Asciidocotor-LateX text.
Probably my implementation of the environent block is too naive.
On May 19, 2015, at 7:48 PM, Dan Allen [email protected] wrote:
i think the default is just to have an attribute which decides the issue.
I think you are absolutely right. After the issue of native LaTeX markup was brought up again by @edusantana in asciidoc/asciidoc#72 (comment), it got me thinking that this needs to be a different parsing mode...and one that is explicit.
We need to think about the right name for the setting, but imagine we have something like:
:infix-latexmath:
Then, we match raw LaTeX expressions in the document...and pass through the content so that it doesn't get interpreted as AsciiDoc. In other words, latexmath:[(a^2)^3 = 1] becomes exactly equivalent to $(a^2)^3 = 1$ and any use of dollar must be escaped if not a LaTeX delimiter.
We can even toggle this attribute throughout the document so that you can do some parts with infix LaTex and some parts in the normal AsciiDoc way.
I'd prefer to make the author always define this attribute, even when using Asciidoctor LaTeX. It's an important semantic piece of information that tells a tool how to process the document (for instance, the Atom editor). Without this information, the processor just don't know what the author intends.
We could even shorten it to something like if an extra attribute is too much of a burden.
:stem: infix-latexmath
Btw, when we enable infix-latexmath, it's essential that we treat the contents of infix LaTeX with passthrough semantics. Otherwise, the expression can get manged by the AsciiDoc parser (perhaps you already do this).
— Reply to this email directly or view it on GitHub.
I like just plain old
:infix-latex:
On May 19, 2015, at 7:49 PM, Dan Allen [email protected] wrote:
I'm open to infix-latex as well :) We don't need the "math" part.
— Reply to this email directly or view it on GitHub https://github.com/asciidoctor/asciidoctor-latex/issues/33#issuecomment-103699381.
I would love to see Asciidoctor-LaTeX in Atom, since I use it all the time and love it. The biggest obstacler right now is that Asciidoctor-LaTeX needs to be cross-compiled via Opal for Asciidoctor-js. However, Asciidoctor-LaTeX uses a Ruby 2.0 feature that is not yet implemented in Opal. @mogztter has done some work on the cross-compilation and we’ve filed an issue with the Opal folks.
You can see many examples work done with Asciidoctor-LaTeX at
http://math.noteshare.io
See also http://home.noteshareblog.io
On May 19, 2015, at 8:13 PM, Eduardo de Santana Medeiros Alexandre [email protected] wrote:
We can even toggle this attribute throughout the document so that you can do some parts with infix LaTex and some parts in the normal AsciiDoc way. I'd prefer to make the author always define this attribute, even when using Asciidoctor LaTeX. It's an important semantic piece of information that tells a tool how to process the document (for instance, the Atom editor). Without this information, the processor just don't know what the author intends.
Will atom be able to detect these changes to process the document?
About the attribute name:
:parse-mode: asciidoc|latex
— Reply to this email directly or view it on GitHub.
I'd like to throw in a note that there are a variety of math delimiters other than $$ [] (). The amsmath {align_}, {equation_} etc. environments are very useful and cannot be embedded inside $$ delimiters in LaTeX input (so trying to automatically add missing delimiters will be challenging).