TexSoup icon indicating copy to clipboard operation
TexSoup copied to clipboard

Non-matching brackets not parsed correctly

Open ivanistheone opened this issue 5 years ago • 6 comments

Certain math notation involves non-matched brackets. For example the set of nonnegative numbers is denoted $[0, \infty)$ in interval notation. TexSoup handle this notation fine on it's own but has trouble if there is command before it this non-matching expression, e.g. $S \cup [0, \infty)$.

tokenize(categorize(r"""$\cup [0, \infty)$"""))

GIVES

tokens= '$'_MathSwitch
  '\'_Escape
  'cup'_CommandName
  ' '_MergedSpacer
  '['_BracketBegin
  '0, '_Text
  '\'_Escape
  'infty'_CommandName
  ')'_Text
  '$'_MathSwitch

I'm thinking it sees the command then the _BracketBegin so it starts to look for a closing bracket thinking these are optional arguments for the command \cup.

Here is the minimal failing test case:

def test_mixed_brackets():
    """Tests handling of math with non-matching bracket after a tex command."""
    soup = TexSoup(r"""$(-\infty,0]$""")  # works fine
    soup = TexSoup(r"""$[0, \infty)$""")  # works fine
    # GH115
    soup = TexSoup(r"""$S \cup [0, \infty)$""")
    assert True

ivanistheone avatar Oct 14 '20 20:10 ivanistheone

Oof yeah you're right.

The problem is that there could exist a space between commands and their arguments. Unfortunately, texsoup just needs to know how many args to expect, so the temporary solution is to add to this dictionary (something like cup: (0, 0) for 0 required, 0 optional args): https://github.com/alvinwan/TexSoup/blob/51334866afa5033b3b6c6408ec2a8c5d69c32abe/TexSoup/reader.py#L29

I know cup is just an example; for a longer term solution, I was thinking of taking lists of operators/commands from lists like this one (bottom of the page) https://www.overleaf.com/learn/latex/Operators, and writing to some .conf or .yaml files that TexSoup comes prepackaged with. Thoughts? Was gonna do this in nov ish, after my next paper deadline.

alvinwan avatar Oct 14 '20 21:10 alvinwan

Yeah the SIGNATURES approach seems like the way to go.

Here are some source code repos that might be a good place to get some signatures from macros:

  • MathJax https://github.com/mathjax/MathJax-src/blob/master/ts/input/tex/base/BaseMappings.ts
  • plasTeX https://github.com/plastex/plastex/blob/master/plasTeX/Base/LaTeX/Math.py
  • specific for (0,0) signatures https://github.com/KaTeX/KaTeX/blob/master/src/symbols.js

No rush to fix this --- I found a workaround for the specific issue by rewriting as $S$ $\cup$ $[0, \infty)$ and it works.

ivanistheone avatar Oct 14 '20 21:10 ivanistheone

Awesome, thanks for the second opinion. And siiick, thanks so much for digging those up. 🙇

For anyone else looking at this thread, I'll make sure to reference this issue when the PR is created.

alvinwan avatar Oct 15 '20 06:10 alvinwan

Suggested workaround (adding cup: (0, 0) to SIGNATURES) seems not to work with equation environment:

    TexSoup(r"""$ \cup [0, \infty)$""")  # works fine
    TexSoup(r""" \begin{equation} \cup [0, \infty) \end{equation}""")  # fails

The exception TexSoup gives is:

<...>
TypeError: [Line: 0, Offset 23] Malformed argument. First and last elements must match a valid argument format. In this case, TexSoup could not find matching punctuation for: [.
Just finished parsing: ['[', '0, ', TexCmd('infty'), ') ', TexCmd('end', [BraceGroup('equation')])]

mishadr avatar May 14 '21 15:05 mishadr

\left and \right also need to be added to SIGNATURES. I think both are (1,0).

equaeghe avatar Jul 24 '21 14:07 equaeghe

A further example of non-matching brackets not parsing correctly: '\\( [ \\infty [ \\)' fails ("TexSoup could not find matching punctuation for: [.") Whereas '\\( [ a [ \\)' and '[\\( \\left[ \\infty \\right[ \\) ]' are parsed correctly. Note that in French, notation such as '[0,1[' for a half-open interval is common.

tschmah avatar Oct 14 '21 17:10 tschmah