TexSoup
TexSoup copied to clipboard
Punctuation sometimes is not parsed right
This code is not parsed right due to wrong tokenization of "\right>}" (it is considered as a single token).
TexSoup.TexSoup(r'''\beq{ \frac{\sigma_{-}}{\left<E\right>} }''')
If we add a space between > and }, parsing succeeds:
TexSoup.TexSoup(r'''\beq{ \frac{\sigma_{-}}{\left<E\right> } }''')
I suppose that this is due to "+ 3" here https://github.com/alvinwan/TexSoup/blob/master/TexSoup/reader.py#L103
What is the particular reason to put +3 here?
I quick-fixed it with that commit: https://github.com/windj007/TexSoup/commit/5e71dd502c95caffb9a2946f9141a94c3475f799
But it does not look like a complete solution. What are the possible successors for \left and \right?
Mm, as far as I know, successors include any punctuation? lgtm
I fear there's also \left\langle, \right\rangle at the very least.
Ah darn, thanks - I'll take a closer look at this.
Found the list of all culprits! Thanks to @windj007 I just had to add to the list of "brackets" (renamed BRACKETS_DELIMITERS)
The same issue arises with "\right)$" but parsing succeeds with a space as "\right) $" . I am using version 0.3.1. Thanks!
Looks like this may be fixed on the latest main branch! I tried the following off of main, and it looks like the assertion passes
soup = TexSoup(r"""$\right)$ hello""")
assert str(list(soup.descendants)[2]) == r'\right)', 'wrong punctuation'
(Feel free to reopen if the issue persists)
Edit: Looks like there are many other issues with a similar problem, like #136 -- I'll test against the latest deployed version to see if I'm just doing something wrong, or if main has fixed the problem.