TexSoup
TexSoup copied to clipboard
Get blocks of connected plain text
Hi,
I was wondering whether it is possible to extract blocks of plain text that are connected. Take the example given in the README:
from TexSoup import TexSoup
soup = TexSoup("""
\begin{document}
\section{Hello \textit{world}.}
\subsection{Watermelon}
(n.) A sacred fruit. Also known as:
\begin{itemize}
\item red lemon
\item life
\end{itemize}
Here is the prevalence of each synonym.
\begin{tabular}{c c}
red lemon & uncommon \\
life & common
\end{tabular}
\end{document}
""")
If I do soup.text
on this, I get
['Hello ',
'world',
'.',
'Watermelon',
'(n.) A sacred fruit. Also known as:\n\n',
' red lemon\n',
' life\n',
'\nHere is the prevalence of each synonym.\n\n',
'c c',
'\nred lemon & uncommon ',
'\\\\',
'\nlife & common\n']
From this, it is not clear that ['Hello ', 'world', '.',]
belong together, since they were separated because of the \textit{}
. Is there any way to extract plain text while preserving some kind of indication that these parts belong together?