Change `$base` to `$self`
The use of 'include': '$base' causes LaTeX syntax highlighting to not work correctly when embedded in another grammar. The short summary of the difference between them, from here, is:
$selfpoints to the grammar$selfappears in (points to itself), whereas$basepoints to the base language of the file, which could be anything.
This means that when language-latex is embedded within language-markdown, every time an 'include': '$base' occurs, Markdown highlighting is included instead of LaTeX highlighting. Ugliness ensues.
Description of the Change
All occurrences of
'include': '$base'
were replaced with
'include': '$self'
Alternate Designs
There is no other way to prevent any incorrect highlighting from the $base file. A compromise that could theoretically work, but does not in practice is using
{
'include': '$self'
}
{
'include': '$base'
}
This does not work for this situation because Markdown highlighting is still injected where there should only be LaTeX highlighting. In this example, the HTML injections (part of Markdown syntax) causes the entire rest of the document to be miscolored:

Benefits
Accurate syntax highlighting for grammars that embed LaTeX. I think this is primarily Markdown, however due to the popularity of Pandoc and programs that use it, such as Knitr/R Markdown, this is an important change that would improve a lot of syntax highlighting for math and tables.
Possible Drawbacks
The use of $self instead of $base means that recursive includes happen only within the scope of the inner file, and do not include the rules within the top-level grammar that is embedding the inner file. However with a few included changes, there should be no drawbacks.
Currently, text.tex.latex.beamer and text.tex.latex.memoir include text.tex.latex, and text.tex.latex includes text.tex.
Including of text.tex
text.tex only uses $base once. That is for anything within arbitrary { ... } blocks.
https://github.com/area/language-latex/blob/2447b74978e2584e610e14d956c1c41cc1b8c7ab/grammars/tex.cson#L73-L88
In order to fix this, I add that rule to the end of text.tex.latex and change both to include: '$self'. This should give near-identical highlighting as now. (I could also include this rule for text.tex.latex.beamer and text.tex.latex.memoir, but this seems unnecessary as those provide few extra rules and are unlikely to be nested within arbitrary { } blocks.)
Including of text.tex.latex
The only other side-effects to watch out for are text.tex.latex.beamer and text.tex.latex.memoir including text.tex.latex. However these side-effects seem rare if not impossible. The extra rules provided within text.tex.latex.beamer and text.tex.latex.memoir seem top-level only, and they would not be used within any of the environments in text.tex.latex that currently employ $base.
For example, you would not have
\begin{equation}
\begin{frame}
\end{frame}
\end{equation}
This means that it's fine to have include: $self and not include: $base in each of these text.tex.latex environments, because they would not need any special rules from the text.tex.latex.memoir or text.tex.latex.beamer grammars.
Applicable Issues
https://github.com/burodepeper/language-markdown/pull/226
@Aerijo
I'm pretty sure I accounted for all possible drawbacks of switching from $base to $self. I can figure why C/C++ would want to use $base, but I think its use is unnecessary here.
With these changes, syntax highlighting when embedded in Markdown works correctly:

text:
```latex
\begin{table}[htbp]\centering
\def\sym#1{\ifmmode^{#1}\else\(^{#1}\)\fi}
\begin{tabular}{l*{5}{c}}
\toprule
&\multicolumn{1}{c}{(1)}&\multicolumn{1}{c}{(2)}&\multicolumn{1}{c}{(3)}&\multicolumn{1}{c}{(4)}&\multicolumn{1}{c}{(5)}\\
&\multicolumn{1}{c}{Dep. Var. }&\multicolumn{1}{c}{Dep. Var. }&\multicolumn{1}{c}{Dep. Var. }&\multicolumn{1}{c}{Dep. Var. }&\multicolumn{1}{c}{Dep. Var. }\\
Variable & 0.00\sym{**} & 0.00\sym{**} & 0.00 & 0.00 & 0.00\sym{**} \\
& (0) & (0) & (0) & (0) & (0) \\
\addlinespace
Variable & 0.00\sym{***}& 0.00\sym{***}& 0.00\sym{***}& 0.00\sym{***}& 0.00\sym{***}\\
& (0) & (0) & (0) & (0) & (0) \\
\addlinespace
Constant & 0.00\sym{***}& 0.00\sym{***}& 0.00\sym{***}& 0.00\sym{***}& 0.00\sym{***}\\
& (0) & (0) & (0) & (0) & (0) \\
\midrule
N & 0 & 0 & 0 & 0 & 0 \\
\bottomrule
\multicolumn{6}{l}{\footnotesize Standard errors in parentheses}\\
\multicolumn{6}{l}{\footnotesize \sym{*} \(p<0.05\), \sym{**} \(p<0.01\), \sym{***} \(p<0.001\)}\\
\end{tabular}
\end{table}
```
@Aerijo any chance this can get a review? I believe with my edits there are only upsides and no drawbacks.
@kylebarron Changing base to self introduces some unwanted side effects. E.g.,
{
\begin{fboxverbatim}
foo
\end{fboxverbatim}
}
When using LaTeX Memoir, the contents was originally verbatim, but the change now treats it as a generic latex environment (because latex grabs hold of the contents of {}. Using base didn't have this issue, because it would point back to memoir.
How many people use memoir, I don't know. This sort of thing applies to everthing else in memoir and beamer as well.
Personally, I believe the better solution is to make a dedicated grammar for embedded latex. It would be bare bones; highlighting commands, math delims (but only the delim itself, not the contents), and avoid any begin/end rules. This way, we still get reasonable highlighting, but no risk of breaking anything.
I'll check that out when I get back to my computer. ~~But I just realized that the only thing that needs to change to self from base for it to work with latex is the text.tex.latex file, because that's what Markdown points to. The other special classes can stay the same.~~
@Aerijo TL;DR: With some minimal repetition, we can add the 1-3 rules that are broken by this change to the memoir and beamer files, so that the rules within those files are recursively scoped first before descending into LaTeX and TeX rules.
These are the rules from memoir and beamer.
'begin': '(?:\\s*)((\\\\)begin)(\\{)(framed|shaded|leftbar)(\\})'
'begin': '(?:\\s*)((\\\\)begin)(\\{)((?:fboxv|boxedv|V)erbatim)(\\})'
'begin': '(?:\\s*)((\\\\)begin)(\\{)(alltt)(\\})'
'begin': '(\\\\use(?:color|font|inner|outer)?theme)(?:(\\[)([^\\]]*)(\\]))?(\\{)'
'begin': '(?:\\s*)((\\\\)begin)(\\{)(frame)(\\})'
'match': '((\\\\)frametitle)(\\{)(.*)(\\})'
As long as these rules don't appear as patterns inside environments changed to $self, we have no drawbacks. Since these are top-level rules, I assert that they would not exist within any environment that currently uses base except for { ... } and maybe 1 or 2 others (see below). It's trivial to fix this by adding a simple { ... } environment to the beamer and memoir files, such that if the grammar starts with either of those files and sees { ... }, it looks within its own rules before descending into text.tex.latex (see latest commit).
Below I've grouped the rules for which the package currently uses $base recursively. They can be grouped into
- programming
- environments
- literal text
- math
The above rules wouldn't appear within math or literal text, so that leaves programming and environments.
I can't imagine any of the above appearing within a named environment or within \ExplSyntaxOn ... \ExplSyntaxOff. Potentially they could occur within the arbitrary \begin{\w+} ... \end{$1} or within \ProvidesExplPackage ... since that rule currently doesn't have an end clause.
If you think either of these two rules could allow recursive memoir or beamer text, I propose adding these two rules, with informative comments, to the memoir and beamer files. It's only ~20 lines of repetition, and then we could satisfy all constituents of the package.
Programming:
-
\ExplSyntaxOn ... \ExplSyntaxOff -
\ProvidesExplPackage ...until end of file? The clause has noendregex... I'm guessing that's a typo. If that's not a typo, it could probably be better highlighted as amatchrather than abegin-endpair. Or at the very least it could end at\begin{document}.
Environments:
-
\begin{align|equation|multline|split|gather|alignat|aligned|gathered|eqnarray|array|tabular|itemize|enumerate|description|list} ... \end{$1} -
\begin{}... \end{}of any not previously named environment.
Literal text:
-
\marginpar{ ... } -
\footnote{ ... } -
\emph{ ... } -
\textit{ ... } -
\textbf{ ... } -
\texttt{ ... }
Math:
-
\( ... \) -
\[ ... \]
Arbitrary braces clause, currently in text.tex but proposed to be added to text.tex.latex, text.tex.latex.beamer, and text.tex.latex.memoir:
-
{ ... }
Hi @Aerijo , I pushed a commit that catches all \begin{}-\end{} environments inside latex-memoir and latex-beamer. So if there's some arbitrary environment that allows a memoir-specific or beamer-specific token to be nested within, it will highlight correctly (i.e. look first within memoir or beamer before descending to latex.cson).
Could you please take a look?
@kylebarron I can't think of anything this breaks. But then again, it's late right now and I'm tired. I'll check back in tomorrow and (probably) merge.
My biggest concern was changing self -> base in tex.cson math, but it seems the behaviour is pretty much broken currently anyway. Latex commands won't (and never did) work in $...$ because they get captured by the "generic math command" rule.
Yup, the changes to add \\begin{\w+} break all environments. E.g., the equation environment will no longer be scoped as math in a memoir document.
Personally, I strongly believe we should be embedding a customised subset of the language instead. I've been sitting on this for a while because something was breaking, but I finally found and squashed the bug.
This way, we can remove begin/end matches entirely (and mostly prevent leaking scopes). The end result would be similar to this:

It may not highlight math the same, but it's much safer inside a block (where the highlighting is less important too IMO). All of the rules here are just match's
I've been on vacation the last week and haven't been able to look at this until now.
You're right that the most recent commit was misguided, but I still strongly believe that it is both possible and desirable to use the standard LaTeX grammar for embedded purposes.
Catching arbitrary \begin ... \end environments in the beamer and memoir files would only be necessary if some of the beamer- or memoir-specific commands were included in some unknown \begin ... \end environment. Since the number of beamer and memoir is tiny, and since logically the beamer- and memoir-specific commands are top-level commands, this would be exceedingly rare. Some small corner case could be added to the beamer or memoir file later upon request.
I reverted that commit and I contend that the current state is stable without breaking environments.
It may not highlight math the same
Your embedded syntax appears to not highlight math at all, which would be a huge step backwards for most users of Markdown, who use LaTeX syntax mostly for math.