TexSoup icon indicating copy to clipboard operation
TexSoup copied to clipboard

Get blocks of connected plain text

Open LeCyberDucky opened this issue 4 years ago • 0 comments

Hi,

I was wondering whether it is possible to extract blocks of plain text that are connected. Take the example given in the README:

from TexSoup import TexSoup
soup = TexSoup("""
\begin{document}

\section{Hello \textit{world}.}

\subsection{Watermelon}

(n.) A sacred fruit. Also known as:

\begin{itemize}
\item red lemon
\item life
\end{itemize}

Here is the prevalence of each synonym.

\begin{tabular}{c c}
red lemon & uncommon \\
life & common
\end{tabular}

\end{document}
""")

If I do soup.text on this, I get

['Hello ',
 'world',
 '.',
 'Watermelon',
 '(n.) A sacred fruit. Also known as:\n\n',
 ' red lemon\n',
 ' life\n',
 '\nHere is the prevalence of each synonym.\n\n',
 'c c',
 '\nred lemon & uncommon ',
 '\\\\',
 '\nlife & common\n']

From this, it is not clear that ['Hello ', 'world', '.',] belong together, since they were separated because of the \textit{}. Is there any way to extract plain text while preserving some kind of indication that these parts belong together?

LeCyberDucky avatar Feb 21 '21 10:02 LeCyberDucky