pythontex
pythontex copied to clipboard
How does PythonTeX work precisely “under the hood”? [Haskell support]
I would like to ask you same question as I have posted earlier on TEX.SE (since I got no answer, and the question is mainly about internal working of PythonTeX
, I hope it is ok to ask here directly):
https://tex.stackexchange.com/questions/548737/how-does-pythontex-work-precisely-under-the-hood?noredirect=1#comment1386126_548737
PythonTeX requires for correct typesetting code and its output at least 3 step compilation:
(lua/pdf/xe)latex
pythontex
(lua/pdf/xe)latex
On Windows 10, the first run creates (among others) file \jobname.pytxcode
, pythontex
run then creates (among others) files with .stdout
extension, and third run reads content of these files and typesetts it.
Is that (at least remotely) correct? And how does it work precisely (on LaTeX
, but also on Python
side). I have been working with PythonTeX
for some time, but I have found out that I am using it mostly as a "blackbox."
Motivation: I would like to create support for Haskell
in PythonTeX
. But Haskell
has pretty specific IO and it would be very helpfull to me (and hopefully also for others) to know how PythonTeX
precisely works.
I would like to add that I am Haskell
newbie/hobbyist, but given writing another language support in PythonTeX
shouldnt be complicated, I have started a small prototype-support (based on support for R). Yet right now I am also considering, whether to use gch
, ghci
or runghc
.
During the first latex
run, all of the code and associated settings are saved to \jobname.pytxcode
. When pythontex
runs, it extracts the code from \jobname.pytxcode
, uses templates to assemble code into temp files, executes the temp files, and then converts the output into files or LaTeX macros that LaTeX can use (.stdout
and .pytxmcr
). The second latex
run imports the files and macros, and typesets everything.
To add another language, you will want to look at pythontex_engines.py
. Looking at the R template is a fine place to start. You might want to look at the CodeEngine
for Rust, since it is compiled.
For a really basic example, you might also look at bash:
bash_template = '''
cd "{workingdir}"
{body}
echo "{dependencies_delim}"
echo "{created_delim}"
'''
bash_wrapper = '''
echo "{stdoutdelim}"
>&2 echo "{stderrdelim}"
{code}
'''
bash_sub = '''echo "{field_delim}"\necho {field}\n'''
CodeEngine('bash', 'bash', '.sh',
'{bash} "{file}.sh"',
bash_template, bash_wrapper, '{code}', bash_sub,
['error', 'Error'], ['warning', 'Warning'],
'line {number}')
Basically, you need a whole-program/script template that changes to the working directory, includes the body of the code, and then writes some delimiters at the end. Each chunk of code needs a wrapper that writes a delimiter to stdout and to stderr at the very beginning, before the code. If you want the substitution environments to work (not required), you need a template for that too that writes a delim to stdout followed by a string representation of whatever corresponds to the substitution template field. The CodeEngine
specifies the name, name of the language, extension, command to run (this can also be a list, see Rust), templates, what errors look like, what warnings look like, and a template for what references to line numbers in errors/warnings look like.
Let me know if you have questions. Also, I think you have seen https://github.com/gpoore/codebraid, my other project that is like PythonTeX for Pandoc Markdown. If using that is an option for you (you can typically just use LaTeX mixed in with Markdown), I believe there are Jupyter kernels for Haskell that already exist.
The part about about "assembling temporary files" is little arcane to me. In bash
support, there are echo
commands, but as far as I know, in this form they are only printing informations to console. For writing from stdout
to file I have found only this:
https://stackoverflow.com/questions/418896/how-to-redirect-output-to-a-file-and-stdout
Which shows commands using pipe operator (if I am correct) to put the otherwise only displayed output to file.
Am I misunderstanding this interaction? Or does the CodeEngine
class do some behind-the-scenes operation to grab console output and pipe it to file.stdout
?
Motivation: If basic support for a language is mostly about "printing information to console" and not dealing with IO, than support for Haskell
could be simply rewritten to haskell console-printing commands (with conversion to printable type), something along these lines (have to test that yet):
haskell_template = '''
setCurrentDirectory "{workingdir}"
{body}
putStrLn show {dependencies_delim}
putStrLn show {created_delim}
'''
etc ...
But if there is an "actual" file writing, then Haskell
is little more complicated to approach.
I hope I am not writing this overly complicated ...
Thank you for suggestion about Rust, I will take a very close look at that. At few glimpses the rust_tex_utils
looks pretty complicated; but I guess you have meant the part inside main
function. "Haskell platform" comes with ghc
which allows execution in three modes:
- "standard"
ghc
compiler - similar toRust
, but I would like to avoidRust
support limitation of prohibited usage ofmain
function in user code. -
ghci
REPL - much likejulia
, I guess. -
runghc
- allowsHaskell
code execution as "script", but code must also containmain()
function (so eventually resolves to same limitation as withghc
)
runghc
might be the most simplistic case, but I try to think about how to implement ghc
as such.
I have looked at Codebraid
and certainly want to try it out, but I have an ongoing project already written in LaTeX
and I am not aware of pandoc
processing, that would allow having part of document in .tex
format, and part as .md
. That is possible??
Actually, as far as I know, pandoc
by itself should allow to integrate Haskell
code within (and I think also to execute it), but I have never tryed that. And for Jupyter I have tryed to use IHaskell
kernel, but as a Windows user I had to use wsl
to be able to run it, but still, the kernel did not load into Jupyter.
I should add that I have programming as a hobby (but that might be outright obvious), so I might be lacking some "common knowledge." Also, I am a Windows user (considering moving to Linux), so that also affects the magnitude of issues I am dealing with.
All of the printing in the templates is to the console (to stdout and to stderr), and then PythonTeX handles capturing those (currently, they are redirected to a file, but that is all handled by PythonTeX). I don't have any significant experience with Haskell, but it looks like putStrLn
and hPutStrLn stderr
will do what is needed.
For using Rust as an example: You can ignore all of the utils code. That is only required if you want dependency tracking and other more advanced features. None of that is needed for basic functioning.
Currently, PythonTeX doesn't support running code through REPLs, so ghci
probably won't be an option. There are REPL/console style modes for Python and Julia, but that is because there are special systems for running code that emulate REPL/console execution. It's not really using REPL/console. Actually using REPL/console is technically possible, but difficult to get right. I made some progress with basics a few years back, but never got very far and ran across many issues. I will probably get some limited REPL/console support in Codebraid soon.
Given how PythonTeX works, there probably isn't a way to avoid the prohibition on main()
in user code. The start and end of main()
will probably have to be in the overall template, as opposed to being entered by the user. Codebraid has a better system that allows you to set outside_main=true
and handle everything yourself.
Depending on how complex your project is, it might be possible to send the .tex
through pandoc
to Markdown and then use that with Codebraid. Actually, another option might be to write some things in Markdown with Codebraid, then convert that to .tex
with pandoc
, and \input
that into your current document.
Since Pandoc can input LaTeX, it should be possible to get Codebraid to work with LaTeX in addition to Markdown. I just haven't gotten that far yet.
Everything should work with Windows. I'm usually working under Windows myself.
I have managed to write down basic support for haskell
as such:
haskell_template = '''
import System.Directory
import System.IO
main = do
setCurrentDirectory "{workingdir}"
{body}
putStrLn "{dependencies_delim}"
putStrLn "{created_delim}"
'''
haskell_wrapper = '''
putStrLn "{stdoutdelim}"
hPutStrLn stderr "{stderrdelim}"
{code}
'''
haskell_sub = '''
putStrLn "{field_delim}"
putStrLn "{field}"
'''
CodeEngine('haskell', 'haskell', '.hs',
'{ghc} --make "{file}.hs"',
haskell_template, haskell_wrapper, 'putStrLn {code}', haskell_sub,
['error', 'Error'], ['warning', 'Warning'],
'line {number}')
SubCodeEngine('haskell', 'hs')
also adding into pythontex.sty
on line 1377:
\ifstrequal{#1}{haskell}{\makepythontexfamily[pyglexer=haskell]{haskell}}{}%
With this setting, I am very often getting parse errors from ghc
. Those are usually caused by putting import
s somewhere else than on the beginning of the script, or using incorrect indentation.
Consider almost simplest haskell code:
putStrLn "a"
With above mentioned setting, what is exactly the content of source file passed to ghc
?
Something like:
import System.Directory
import System.IO
main = do
setCurrentDirectory "{workingdir}"
putStrLn "{stdoutdelim}"
hPutStrLn stderr "{stderrdelim}"
putStrLn "a"
putStrLn "{dependencies_delim}"
putStrLn "{created_delim}"
or something else? Does anywhere in the process space-gobbling happen? Errors that I am getting suggest so, I am not sure.
I am also looking to work more with pandoc
(but there are limitations along support only subset of LaTeX) or ConTeXt, which could be more suitable for future projects. Still, I would like to put some more time to try to add support to pythontex
as well, if I would be up to it.
You can use \usepackage[keeptemps]{pythontex}
to keep all temp files in the pythontex-files-*
directory. That way, you can see exactly what is being executed.
Just looking at this, one issue seems to be indentation. You want everything under main = do
to be indented, but the template code (wrapper) and the code you are supplying yourself are not indented. To fix this, we would need to add an option to add indentation to all template and user code. That should be a straightforward process. I can look into adding that feature.
If I am understanding correctly, import
must always be used before function definitions, so import
will never work within user code because that will be inside main
. There are a few ways to try to work around that.
- You could add more imports to the template. But that will always limit things somewhat.
- Otherwise, the overall code execution system would need some sort of modification to allow code to be inserted before
main
.- It might be possible to look at the first chunk of code and relocate every line that starts with
import
to beforemain
. I did something similar with Python to deal with imports from__future__
. Although at that point it seems like we're starting to create something that isn't exactly Haskell anymore. - Another option would be having a way to insert code into the template before
main
. Codebraid has limited capabilities for this sort of thing, but PythonTeX doesn't at this point. That is definitely doable, it would just require more work.
- It might be possible to look at the first chunk of code and relocate every line that starts with
Regarding Pandoc: Recent versions now have raw inlines and raw blocks, so I believe that removes most LaTeX limitations. For example, `\LaTeX`{=latex}
gets passed straight through to LaTeX without modification, and the same thing is possible for code blocks by starting with ```{=latex}
. With what you have already put together here, I could probably add Haskell support for Codebraid (run Haskell in Pandoc Markdown) relatively easily if that would be helpful.
With pandoc
there is already support for literate haskell in means of an extension. I think that there is no substantial difference between code execution or compilation of literate haskell or "normal" haskell.
My question was motivated mostly by option to add support for haskell to PythonTeX
. As you have written, it seems mostly sensible only if there will be option to put code outside of main
function. Adding indentation should be doable, as moving import
declarations around, but in haskell most coding is happening outside of main
function, I believe.
Are you interested in adding feature to allow code outside of main
? I know you have mentioned already, that this would be desired also for Rust. How could I be helpful in that?
With that, codebraid
and pandoc
itself are definitely an option, even though harder to automate (with PythonTeX
, I can now simply leave compilation to arara
and check the result after an hour ...)
Sorry for the delay in responding. I'm back to in-person teaching combined with hybrid/online in some cases to handle quarantined students, and that's severely limiting my time for software projects. If you can come up with a way to get Haskell working with PythonTeX without too many changes, I'm happy to accept a pull request. Otherwise, I may be able to think about this again in a few months. My eventual goal is to replace the code execution part of PythonTeX with the code execution part of Codebraid, and if I ever have some time to do that, then supporting Haskell should be trivial.
Its alright, those are weird and complicated times.
I was actually taking this issue as postponed from your previous comment. With your latest reaction, I am even more inclined to leave this issue as is now and when I need haskell, use codebraid
, until PythonTeX
will have the same capabilities. I have actually started to migrate my new projects to pandoc
processing (and with that in time utilizing codebraid
over PythonTeX
when suitable), but it takes some time ...
From time to time I am looking also at codebraid
issue page; and there are some wonderfull things in motion, so for the time being I might even use codebraid
more than PythonTeX
.
I will watch closely both projects and try to help whenever possible (and able); and after PythonTeX
update I will look into adding haskell into its family of supported languages.