asciidoctor-latex icon indicating copy to clipboard operation
asciidoctor-latex copied to clipboard

On Reserved Characters and the Two-Universe Problem

Open jxxcarlson opened this issue 9 years ago • 11 comments

One of the thorny issues with the LaTeX backend for Asciidoctor is what to do with characters such as $, &, _ and a few others. The first plays a spacial role in the LaTeX universe as a delimiter for mathematical expressions, e.g. $(a^2)^3 = 1$. In the universe we normally inhabit, it is for US dollars. (Ahem, and bash?)

The problem is that these two universes are not entirely separate. I've been trying to handle the conflicts with various hacks. Now, however, I have a large set of documents "from the wild" on which to apply the converter, and realized that it is probably best to enforce a separation:

  1. Asciidoc documents will treat symbols like $, &, _ as normal people do. That means that they will be escaped by Asciidotor before being sent into the gullet of the Asciidoctor-LaTeX converter. The result should be reliable conversoin to LaTeX for those who want it. It is also a route to getting PDF. That said, Asciidoctor's HTML format is more pleasing to the eye.
  2. Documents written in Asciidoctor-LaTeX will treat symbols like $,&, _ according to the rules of the LaTeX universe. This means that the user must escape these symbols if they are to have their conventional meaning. Mathematicians and other LaTeX users are used to doing this, so this should not cause any sociological problems.
  3. One has to be liberal about interpreting 1) vs (2). Asciidoctor-LaTeX's [env.FOO] feature can be quite handy for authors that never write an equation. The main point of distinction is the presence of mathematical text.

Noteshare can do a fairly good job of auto-detection of the universe in which the author is operating, but i think the default is just to have an attribute which decides the issue. Thus a law office using Asciidoctor-LaTeX would not set that attribute and lawyerly documents will convert as expected, and ditto for, say, docs with code but no math. We leave the burden of setting the attribute on those who need it the most.

The name of the attribute should not be :latex: since that would be confusing -- the user is already using Ascidoctor-LaTeX.

@mojavelinux, @jirutka , all, I would appreciate your comments before proceedidng.

jxxcarlson avatar May 19 '15 09:05 jxxcarlson

i think the default is just to have an attribute which decides the issue.

I think you are absolutely right. After the issue of native LaTeX markup was brought up again by @edusantana in https://github.com/asciidoc/asciidoc/pull/72#issuecomment-102643894, it got me thinking that this needs to be a different parsing mode...and one that is explicit.

We need to think about the right name for the setting, but imagine we have something like:

:infix-latexmath:

Then, we match raw LaTeX expressions in the document...and pass through the content so that it doesn't get interpreted as AsciiDoc. In other words, latexmath:[(a^2)^3 = 1] becomes exactly equivalent to $(a^2)^3 = 1$ and any use of dollar must be escaped if not a LaTeX delimiter.

We can even toggle this attribute throughout the document so that you can do some parts with infix LaTex and some parts in the normal AsciiDoc way.

I'd prefer to make the author always define this attribute, even when using Asciidoctor LaTeX. It's an important semantic piece of information that tells a tool how to process the document (for instance, the Atom editor). Without this information, the processor just don't know what the author intends.

We could even shorten it to something like if an extra attribute is too much of a burden.

:stem: infix-latexmath

Btw, when we enable infix-latexmath, it's essential that we treat the contents of infix LaTeX with passthrough semantics. Otherwise, the expression can get manged by the AsciiDoc parser (perhaps you already do this).

mojavelinux avatar May 19 '15 23:05 mojavelinux

I'm open to infix-latex as well :) We don't need the "math" part.

mojavelinux avatar May 19 '15 23:05 mojavelinux

I do have one question about delimiters and I'm completely open to the response. You probably noticed that I changed the inline delimiter for LaTeX that we pass to MathJax from $ to \(. How do you feel about requiring inline expressions to be written as \((a^2)^3 = 1\) instead of $(a^2)^3 = 1$?

I'll mention the reason I made this change is two-fold. First, $ is a very typical character to find in content, almost always to mean US dollar. Second, \( and \) are more consistent with the block / display LaTeX delimiter of \[ and \].

I ask because the risk is that $ is so embedded in the LaTeX community that it could be unnatural to change now. However, this is also an opportunity to move forward. wdyt?

mojavelinux avatar May 19 '15 23:05 mojavelinux

We can even toggle this attribute throughout the document so that you can do some parts with infix LaTex and some parts in the normal AsciiDoc way. I'd prefer to make the author always define this attribute, even when using Asciidoctor LaTeX. It's an important semantic piece of information that tells a tool how to process the document (for instance, the Atom editor). Without this information, the processor just don't know what the author intends.

Will atom be able to detect these changes to process the document?

About the attribute name:

:parse-mode: asciidoc|latex

edusantana avatar May 20 '15 00:05 edusantana

Or perhaps:

:parse-mode: asciidoc+latex

or even

:parse-flags: +latex

Will atom be able to detect these changes to process the document?

One step at a time. Let's call it a goal. We'll get there.

mojavelinux avatar May 20 '15 00:05 mojavelinux

(1)

I think that passing ( .. ) to MathJax is the right thing to do.

In Asciidoctor-LaTeX, the preprerpocessor already maps $ … $ to ( … ) as best it can (see the regexes below). If the backend is HTML, there is nothing further to do. If it is LaTeX, it maps ( … ) back to $ … $, since that is what most mathematicians expect.

IMPORTANT: If you have suggestions on the regex’s, that would be great — there some sticky issues.

(2)

LaTeX accepts ( .. ) and [ .. ] as well as $ .. $ and $$ … $$. Mathematicians fall into three categories:

(a) Dinosaurs — use $ .. $ and $$ … $$ — still common, even in the younger set (b) Hybrids —use $ … $ and [ … ] — very common. I am among them (red face) (c) Moderns — use ( .. ) and [ … ]

Anyway, we should still pass ( .. ) to MathJax.

REGEX:

TEX_DOLLAR_RX = /\$(.*?)\$/
TEX_DOLLAR_SUB = '\\\(\1\\\)'
TEX_DOLLAR_SUB2 = '+\\\(\1\\\)+'

if line.include? '$'
  line = line.gsub TEX_DOLLAR_RX, TEX_DOLLAR_SUB2
end

The plus signs in TEX_DOLLAR_SUB2 are there to mitigate bad substitutions e.g.

\( (a^2)^3 = 1 \)

gets messed up because of the two ^’s.

I also have to protect the interior of \[ … \] with the below

# protect math, e.g., (a^2)^3 from Asciidoc subsitutions:
if line =~ /^\\\[/
  line = line.gsub /^\\\[/, '+\\['
end
if line =~ /^\\\]/
  line = line.gsub /^\\\]/, '\\]+'
end

Jim

http://noteshare.io http://home.noteshareblog.io

On May 19, 2015, at 7:53 PM, Dan Allen [email protected] wrote:

I do have one question about delimiters and I'm completely open to the response. You probably noticed that I changed the inline delimiter for LaTeX that we pass to MathJax from $ to (. How do you feel about requiring inline expressions to be written as ((a^2)^3 = 1) instead of $(a^2)^3 = 1$?

I'll mention the reason I made this change is two-fold. First, $ is a very typical character to find in content, almost always to mean US dollar. Second, ( and ) are more consistent with the block / display LaTeX delimiter of [ and ].

I ask because the risk is that $ is so embedded in the LaTeX community that it could be unnatural to change now. However, this is also an opportunity to move forward. wdyt?

— Reply to this email directly or view it on GitHub.

jxxcarlson avatar May 20 '15 02:05 jxxcarlson

Addendum — perhaps we can have a preprocessor swtich for ( .. ) vs $ … $. The $ .. $ is indeed deeply embedded. The ( .. ) is better syntactly — much better — but involves much more hand motion — not just twice the number of characters, but more motion over the keyboard and some of it more awkward you have to use the shift key twice, and also get your pinky on the damn backslash. But inline math occurs with such high frequency that it is a big deal.

That said — i don’t think we should have too many switches.

On May 19, 2015, at 7:53 PM, Dan Allen [email protected] wrote:

I do have one question about delimiters and I'm completely open to the response. You probably noticed that I changed the inline delimiter for LaTeX that we pass to MathJax from $ to (. How do you feel about requiring inline expressions to be written as ((a^2)^3 = 1) instead of $(a^2)^3 = 1$?

I'll mention the reason I made this change is two-fold. First, $ is a very typical character to find in content, almost always to mean US dollar. Second, ( and ) are more consistent with the block / display LaTeX delimiter of [ and ].

I ask because the risk is that $ is so embedded in the LaTeX community that it could be unnatural to change now. However, this is also an opportunity to move forward. wdyt?

— Reply to this email directly or view it on GitHub https://github.com/asciidoctor/asciidoctor-latex/issues/33#issuecomment-103699738.

jxxcarlson avatar May 20 '15 02:05 jxxcarlson

If I’ve understood correctly, I think this is the right path; Sorry for the length of this. It should be better readable as Asciidoc

. Should not all three of latex math:[(a^2)^3 = 1], $(a^2)^3 = 1$ and \( (a^2)^3 = 1 \)
be treated as pass-throughs and sent to MathJax as in the third option? How do you reliably recognize $(a^2)^3 = 1$- type expressions. (I guess the answer is that the author needs to write syntactically correct text and esape non-math uses of $ — then the regexes can do their work.

. Sticky point. Via noteshare I get Asciidoctor-LaTeX from the wild. Some folks write $ .. $ on two lines, or maybe something happens and a line break is introduced. If Asciidoctor only does line-by-line parsing, this may be unsolvable until there is a new parser. (Is @jirutka working on this? I seem to have
seen something intriguiing.

. The contents of \[ … \] also need to be passed through. I have some test cases that forced me to put +’s on either side. Here is an example:

. The contents of certain env-blocks need to passed through as well. There is hort, defined list of these: (1) [env.equation], (2) [env.equationalign], (3) [env.jsxgraph], (4) [env.chem], maybe (5) [env.code]. The list could grow, but it will always be short. (3) is for interactive apps like the mass-spring demo. There should probably also be [env.javascrpt] (3) is javascript plus some libraries that need to be called. (4) invokes the mhchem LaTeX package and makes it possible for chemists to write their stuff in a way that works better for them than straight LaTeX (5) is an automatically numbered code enivironment that I have been using for some writing.

Here is an example of a special env block that is problematic — It does not render properly because of the ^^ substtiutions

//.tex_pathologies6
[env.equationalign]
--
\omega(s) = x^{-1/2}(x-1)^{-1/2}(x-s)^{-1/2} dx \\
\omega'(s) = \frac{1}{2} x^{-1/2}(x-1)^{-1/2}(x-s)^{-3/2} dx \\
\omega'(s) = \frac{3}{4} x^{-1/2}(x-1)^{-1/2}(x-s)^{-5/2} dx
—

Here is another:

$f'(x) = s(s-1)+ O((x-s))$

The double parens disapear

Recall that all other [env.FOO] constructs are “normal” in that their block contents should be processed by Asciidoctor / LaTeX . This doesn’t happen — try this test file:


[env.theorem]

We require that

  • $ a + b = b + a$
  • $ a + (b + c) = (a + b) + c$. That's all folks!


All this said, I am able to render qute a bit of Asciidocotor-LateX text.

Probably my implementation of the environent block is too naive.

On May 19, 2015, at 7:48 PM, Dan Allen [email protected] wrote:

i think the default is just to have an attribute which decides the issue.

I think you are absolutely right. After the issue of native LaTeX markup was brought up again by @edusantana in asciidoc/asciidoc#72 (comment), it got me thinking that this needs to be a different parsing mode...and one that is explicit.

We need to think about the right name for the setting, but imagine we have something like:

:infix-latexmath:

Then, we match raw LaTeX expressions in the document...and pass through the content so that it doesn't get interpreted as AsciiDoc. In other words, latexmath:[(a^2)^3 = 1] becomes exactly equivalent to $(a^2)^3 = 1$ and any use of dollar must be escaped if not a LaTeX delimiter.

We can even toggle this attribute throughout the document so that you can do some parts with infix LaTex and some parts in the normal AsciiDoc way.

I'd prefer to make the author always define this attribute, even when using Asciidoctor LaTeX. It's an important semantic piece of information that tells a tool how to process the document (for instance, the Atom editor). Without this information, the processor just don't know what the author intends.

We could even shorten it to something like if an extra attribute is too much of a burden.

:stem: infix-latexmath

Btw, when we enable infix-latexmath, it's essential that we treat the contents of infix LaTeX with passthrough semantics. Otherwise, the expression can get manged by the AsciiDoc parser (perhaps you already do this).

— Reply to this email directly or view it on GitHub.

jxxcarlson avatar May 20 '15 04:05 jxxcarlson

I like just plain old

:infix-latex:

On May 19, 2015, at 7:49 PM, Dan Allen [email protected] wrote:

I'm open to infix-latex as well :) We don't need the "math" part.

— Reply to this email directly or view it on GitHub https://github.com/asciidoctor/asciidoctor-latex/issues/33#issuecomment-103699381.

jxxcarlson avatar May 20 '15 04:05 jxxcarlson

I would love to see Asciidoctor-LaTeX in Atom, since I use it all the time and love it. The biggest obstacler right now is that Asciidoctor-LaTeX needs to be cross-compiled via Opal for Asciidoctor-js. However, Asciidoctor-LaTeX uses a Ruby 2.0 feature that is not yet implemented in Opal. @mogztter has done some work on the cross-compilation and we’ve filed an issue with the Opal folks.

You can see many examples work done with Asciidoctor-LaTeX at

http://math.noteshare.io

See also http://home.noteshareblog.io

On May 19, 2015, at 8:13 PM, Eduardo de Santana Medeiros Alexandre [email protected] wrote:

We can even toggle this attribute throughout the document so that you can do some parts with infix LaTex and some parts in the normal AsciiDoc way. I'd prefer to make the author always define this attribute, even when using Asciidoctor LaTeX. It's an important semantic piece of information that tells a tool how to process the document (for instance, the Atom editor). Without this information, the processor just don't know what the author intends.

Will atom be able to detect these changes to process the document?

About the attribute name:

:parse-mode: asciidoc|latex

— Reply to this email directly or view it on GitHub.

jxxcarlson avatar May 20 '15 04:05 jxxcarlson

I'd like to throw in a note that there are a variety of math delimiters other than $$ [] (). The amsmath {align_}, {equation_} etc. environments are very useful and cannot be embedded inside $$ delimiters in LaTeX input (so trying to automatically add missing delimiters will be challenging).

oddhack avatar Sep 17 '15 06:09 oddhack