tectonic icon indicating copy to clipboard operation
tectonic copied to clipboard

Provide a way to deal with memory-intensive files

Open jakelangham opened this issue 6 years ago • 25 comments
trafficstars

Hi,

A known issue when using pgfplots for figures is exceeding TeX's main memory capacity. I have been trying out tectonic and have the same problem when compiling a document with a lot of figures.

The error message is

! TeX capacity exceeded, sorry [main memory size=5000000].

and the standard solution is to externalise plot creation by putting

\usepgfplotslibrary{external} 
\tikzexternalize

in the preamble and compiling with pdflatex -shell-escape. If I understand correctly, this won't be possible due to the way that tectonic is implemented.

Another obvious solution is to increase the memory capacity - I know how to do this in a standard TeX distribution, but can't see how to do so with tectonic. Any advice?

Thank you,

jakelangham avatar Feb 28 '19 13:02 jakelangham

Thanks for the report! This problem might be a bit tricky for Tectonic.

You're right that Tectonic doesn't allow shell escapes inside the engine, in the name of reproducibility. There have been enough cases where shell-escape comes in handy that I'd be willing to bring the feature back, under some kind of --non-reproducible flag or its moral equivalent. But I think it would be a fair amount of work to implement and it hasn't been a personal priority.

As for increasing the memory capacity, Tectonic currently just hardcodes a number at compile-time to make things simpler:

/* the size of our main "mem" array, minus 1; classically this is
 * configurable, but we hardcode it. */
#define MEM_TOP 4999999

I've had the idea in my head that it would be cool if Tectonic could actually use dynamic memory allocation (!) to grow the TeX memory pool as needed, but haven't done any research into that idea.

So at the moment, I think the best available approach would be to compile your own version of Tectonic after editing the MEM_TOP define. It's in tectonic/xetex-constants.h. I'm pretty sure things are set up correctly such that the internal constants will all remain self-consistent if you change it.

What the heck is pdfplots doing that it takes so much memory, anyway?

Actually, let me rephrase that: if you have a small test case that triggers the memory overflow, it would be handy if you could paste it or attach it to this issue for future reference.

pkgw avatar Feb 28 '19 14:02 pkgw

I think the issue is simply that while pgf plots produce nice graphics, they aren't stored or handled efficiently.

If the total size of the input pgfs exceeds around 6Mb, that is enough to trigger the problem (even if the resultant pdf would be far smaller). This can easily happen with scientific plots. So a minimal example would look something like

\documentclass{article}
\usepackage{pgfplots}

\begin{document}

\input{plot.pgf}

\end{document}

where plot.pgf is a sufficiently big file - example attached: plot.pgf.zip

I'll increase MEM_TOP as suggested let you know if it creates any issues.

jakelangham avatar Feb 28 '19 17:02 jakelangham

Hmm. Unfortunately changing MEM_TOP breaks tectonic mysteriously, producing

Running TeX ...
error: something bad happened inside TeX; its output follows:

===============================================================================
===============================================================================

error: the TeX engine had an unrecoverable error
caused by: fatal format file error

when trying to compile any file. --print / --keep-logs are no help here.

jakelangham avatar Feb 28 '19 19:02 jakelangham

Hmmm ... maybe try blowing away ~/.cache/Tectonic/formats ? If you have preexisting copies of those files, I think they'll be invalid, and the code may not be clever enough to notice that.

pkgw avatar Feb 28 '19 20:02 pkgw

Thanks, that works a treat and scaled fine to my actual use case. I'll just add for any future visitors that because I'm doing this build on a mac my cache is in ~/Library/Caches/Tectonic/formats

jakelangham avatar Feb 28 '19 22:02 jakelangham

Awesome! It's good to have a workaround, although it's not exactly easy.

Here are some directions that Tectonic could go from here to help prevent this problem or make it easier to work around:

  • Increase the main memory size?
  • Make the main memory size run-time configurable?
  • Add dynamic memory allocation?
  • Fix it so that if you change the memory size, there's an actual error message in the format file load
  • Implement shell-escape?

My available bandwidth is such that I don't expect to tackle any of these in the near future, but let's keep track of this issue.

pkgw avatar Mar 01 '19 13:03 pkgw

* Add dynamic memory allocation?

I believe LuaTeX has this so perhaps it is possible to research their implementation. Run-time configuration seems like a good-and-easier-to-implement compromise, which would still beat the memory management of the usual tex toolchain [does tectonic already have a user config file?]. Although looking at xetex-constants.h, a lot of constants would need to become variables so not exactly a trivial bit of restructuring.

jakelangham avatar Mar 01 '19 13:03 jakelangham

Hi, I recently incurred in the same problem. The shell-escape solution is not working for me at the moment and I don't know why. Nonetheless it doesn't anyway solve the problem in case a single graph may be to heavy for an engine job on its own. So I was wondering how the same workaround posted in this thread could be implemented at this time, since my attempts at modifying the memory in the same way are failing. Thank you for this amazing software by the way.

phtalo avatar Jun 25 '22 10:06 phtalo

Hi, as far as I can see/recall it was a matter of 1) pulling the latest version of the source, 2) editing the relevant line in tectonic/xetex-contants.h as per https://github.com/tectonic-typesetting/tectonic/issues/319#issuecomment-468290512, 3) deleting the cache as per https://github.com/tectonic-typesetting/tectonic/issues/319#issuecomment-468427853, 4) recompiling the program.

If you could indicate which step is creating problems I'll try to help if I can.

jakelangham avatar Jun 25 '22 11:06 jakelangham

With some of the recent infrastructural changes, the way to do this has evolved slightly ... hopefully in a way that makes it more likely to be successful! Here is an untested sketch of what I think should work:

  1. Edit definition of MEM_TOP in xetex_format crate.
  2. Increase LATEST_VERSION in xetex_format crate.
  3. Update FORMAT_SERIAL in engine_xetex crate to match.
  4. Regenerate xetex-format.h and xetex_bindings.h in engine_xetex crate as described in its README.

You'll then need to manually specify the bundle URL because the built-in URL template munges in FORMAT_SERIAL, and there won't be a public bundle for the new serial number.

edit: Or instead of manually specifying the bundle URL, one could edit get_fallback_bundle_url() in the bundles crate to hardcode a known format version instead of trying the "new" version number.

pkgw avatar Jun 25 '22 14:06 pkgw

Thank you @pkgw, this worked out perfectly, I can tell you so much.

@jakelangham thank you for your input too. Unfortunately there has been a major overhaul structure-wise, so it was not possible to modify the same file.

phtalo avatar Jun 27 '22 06:06 phtalo

Thank you for following up! Out of curiosity, what memory setting was big enough for you? We don't want to be excessive, but on modern machines it might not be unreasonable to increase the default memory size by a factor of a few, even if most users don't hit the limit.

pkgw avatar Jun 27 '22 13:06 pkgw

Dear @pkgw, sorry for not answering yet, but I don't know the answer still. My thesis is on its way and then I will have to experiment with different sizes. Anyway thank you so much.

phtalo avatar Oct 28 '22 18:10 phtalo

I also encountered this issue. Are there any changes?

TomLebeda avatar Apr 06 '23 09:04 TomLebeda

@pkgw What is the current way to change the value? I tried building from source, but couldn't get it to work due to many missing files like fatal error: tectonic_bridge_flate.h: No such file or directory and warning: /home/jesse/Programming/eTeacher/tectonic/crates/bridge_flate/include: No such file or directory [-Wmissing-include-dirs]

jhoobergs avatar Nov 27 '23 10:11 jhoobergs

@jhoobergs Sorry for not replying sooner, but it is hard for me to guess what might have been going on with your build. It sounds like you are missing some of the files generated using cbindgen, but they should generally be stored within the Git repo or automatically generated by the build process, so I am not sure how they would end up missing for you.

pkgw avatar Feb 04 '24 17:02 pkgw

Why not make this a flag instead? I wouldn't mind giving Tectonic a few gigabytes.

Something like this?

tectonic -X compile index.tex -m 1000

where 1000 is in megabytes.

winstxnhdw avatar Feb 08 '24 12:02 winstxnhdw

I would like to see this flag or config option. I often have complex documents, spread across many files and have run into this limit.

jacobsalmela avatar Feb 08 '24 12:02 jacobsalmela

Yup, a single 3D plot at 500 samples is easily able to hit the limit.

winstxnhdw avatar Feb 08 '24 12:02 winstxnhdw

I have solved the problem with pgfplots running out of memory by using pgfplots-external with lualatex (which has dynamic memory allocation iirc). It's not pretty, but as a temporary solution, it's better than nothing.

Snippet from my preamble:

\usepackage{pgfplots}
\usepgfplotslibrary{external}
\tikzset{external/system call={lualatex \tikzexternalcheckshellescape -halt-on-error -interaction=batchmode -jobname "\image" "\texsource"}}
\tikzexternalize[prefix=tikz/]

But to be honest, the setup was basically trial-and-error and now I'm scared to touch it so it won't break.

TomLebeda avatar Feb 08 '24 12:02 TomLebeda

I don’t want more TeX dependencies on my system. What I do now is paste into Overleaf and export as PDF hahaha

winstxnhdw avatar Feb 08 '24 12:02 winstxnhdw

It's not a comand-line flag because the way that Tectonic does things currently, the memory limit has to be a compile-time constant. That restriction could probably be reduced with a bit of work, but I don't have a sense of how hard it would be.

pkgw avatar Feb 08 '24 13:02 pkgw

You're right. I thought it was as simple as allowing VLAs, but I took a look at the code and it does look pretty bad.

winstxnhdw avatar Feb 08 '24 14:02 winstxnhdw

It could be totally reasonable to bump up the default compile-time limit to at least support somewhat bigger documents. The tradeoff is that the memory buffers are statically allocated, so every invocation will use all of that memory whether it needs it or not.

The other thing is that currently, the format file generation process embeds whatever the memory size setting is, so if you make the memory size tunable, you need to generate a new format for each value. This isn't the end of the world by any means, but it's a bit of a drag.

pkgw avatar Feb 08 '24 14:02 pkgw