pylatexenc
pylatexenc copied to clipboard
Support for `\newenvironment` wrapping another environment
latex2text fails when parsing a document that contains a \newenvironment
command that wraps an existing environment. I have been able to narrow it down to the following minimum example:
latex2text --code '\newenvironment{annotate}{\begin{scope}}{\end{scope}}'
which gives the following output:
INFO:pylatexenc.latexwalker:Ignoring parse error (tolerant parsing mode): Unexpected mismatching closing brace: '}' @(1,39)
INFO:pylatexenc.latexwalker:Ignoring parse error (tolerant parsing mode): Unexpected closing environment: 'scope' @(1,41)
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/pylatexenc/latexwalker/__init__.py", line 2248, in do_read
mspec.parse_args(w=self, pos=tok.pos + tok.len,
File "/usr/local/lib/python3.8/site-packages/pylatexenc/macrospec/__init__.py", line 95, in parse_args
return self.args_parser.parse_args(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/pylatexenc/macrospec/_argparsers.py", line 293, in parse_args
(node, np, nl) = w.get_latex_expression(
File "/usr/local/lib/python3.8/site-packages/pylatexenc/latexwalker/__init__.py", line 1551, in get_latex_expression
tok = self.get_token(pos, environments=False, parsing_state=parsing_state)
File "/usr/local/lib/python3.8/site-packages/pylatexenc/latexwalker/__init__.py", line 1356, in get_token
raise LatexWalkerEndOfStream(final_space=space)
pylatexenc.latexwalker.LatexWalkerEndOfStream
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/latex2text", line 11, in <module>
load_entry_point('pylatexenc==2.8', 'console_scripts', 'latex2text')()
File "/usr/local/lib/python3.8/site-packages/pylatexenc/latex2text/__main__.py", line 190, in main
(nodelist, pos, len_) = lw.get_latex_nodes()
File "/usr/local/lib/python3.8/site-packages/pylatexenc/latexwalker/__init__.py", line 2351, in get_latex_nodes
r_endnow = do_read(nodelist, p)
File "/usr/local/lib/python3.8/site-packages/pylatexenc/latexwalker/__init__.py", line 2251, in do_read
e = self._exchandle_parse_subexpression(
File "/usr/local/lib/python3.8/site-packages/pylatexenc/latexwalker/__init__.py", line 1862, in _exchandle_parse_subexpression
e.open_contexts.append(
AttributeError: 'LatexWalkerEndOfStream' object has no attribute 'open_contexts'
By trial and error, I found out that parsing works if I add a custom definition macrospec.std_macro('newenvironment', "*[[{{"),
, i.e. remove the first {
argument from the default *{[[{{
.
Hi and thanks for the report. There are two points to unpack here.
First, you're seeing an error mainly because pylatexenc
does not really support the \newcommand
/\newenvironment
family of commands (see my comment in issue 48). It attempts to parse the arguments to \newenvironment
like LaTeX blocks of text; it doesn't record in any way the new command or new environment for future parsing. In most cases you might not see any errors in tolerant parsing mode, but what really happens is that the \newcommand/\newenvironment
instruction gets ignored (it gets parsed as a simple macro node and then converted to empty text), and then later in the document custom macros are handled using the default behavior for unknown macros or environments (which might or might not give you the desired behavior). I have plans for better support of \newcommand
/\newenvironment
commands, but they haven't been fully implemented yet. I've written some (experimental) code in my other project latexpp that expands some commands defined by \newcommand
/\newenvironment
. Depending on your use case, you might be able to reuse some code from https://github.com/phfaist/latexpp/blob/master/latexpp/fixes/newcommand.py for your purposes.
On the other hand, the additional exception you're seeing ("AttributeError: 'LatexWalkerEndOfStream' object has no attribute 'open_contexts'") is a bug and I'll look to fix it. Thanks for reporting.
My latest commit should fix the weird chained exception that you reported. I'm leaving the issue open as an enhancement to enable \newenvironment
wrapping another environment. This should be support once I get the \newcommand
family of friends supported. (See also #48.) Thanks for reporting!
Thanks! I have tested the master branch and can confirm that the parsing error is gone. :)
I also noticed that now (and also with my proposed workaround) the resulting node representation is kind of screwed up and subsequent latex code is still regarded as belonging to the \newenvironment
command. Do to coincidence, this does not bother me in my special use case, so I am happy as soon as I can parse the file without exceptions.
I guess in general it is still desirable to be able to parse \newenvironment
properly and represent \begin
and \end
commands in the arguments in some reasonable way.
FYI: My use case is a LaTeX to text script that supports all the special macros I usually use and can handle includes with the standalone
package. That includes some hacks to throw away the preambles from imported documents and lots of special formatters for custom commands. So in this case, I do not want \newcommand
definitions to be parsed and applied like latexpp
does, but I am perfectly happy with having a parsed node list that I can transform as needed.