tectonic
tectonic copied to clipboard
What exactly is being downloaded?
New Latex user here, I have a simple tex file:
\documentclass{minimal}
\begin{document}
Look, some text!
\end{document}
I started compiling this document over a half hour ago, and Tectonic still seems to be downloading a TON of dependecies. Is there any way I can tell it to only use the bare minimum?
One answer is that in LaTeX, "the bare minimum" is still a surprisingly large number of files. The way that the system is designed, it really ought to be only fetching what it can get away with, although it's also true that the engine needs to pull all of the dependencies to create the "format file", which is a larger set of files than those needed to compile a document once the format file has been created.
But anyway, the initial download should only take a few minutes, not half an hour. Did you do this run on a slow network?
One possible thing for us to do: document how much data needs to be download to compile a bare-minimum document, so people can get a sense of how long it might take to do their first run.
@pkgw Does the format file creation always require the same files? One can think of providing say a zip file with all these files that tectonic could download (or be shipped with) at once. My feeling is that the reason for the initial download to take this long is not because of the amount of data, but due to the number of http requests.
Another simple option is to just provide a "progress bar" if the total number of files is known.
@rekka Yeah, it's a very constant set of files, and if we had a "multi-tool" type of CLI (like git
) I'd definitely like the idea of providing an option to bootstrap the cache:
tectonic bootstrap-cache ~/path/to/files.zip
I think of HTTP requests as not having a very big overhead, but we're opening a new TCP connection for each request (I think!), and a lot of our files are small, so maybe that's not a good approximation here.
@pkgw Yep, something like that would be nice.
In my case the overhead for the http connection is somewhere around 1.5-2s per file, so it is quite substantial (I am in Japan, which might add a few hundred ms).
I was also curious about this, because after getting a report from a user taking ~20 minutes, I zapped my tectonic cache and redownloaded everything, and it was slow here as well. Before that, I had never noticed the population of the cache. I thought that maybe something had changed (adding hyphenation files bringing in new requirements?), or that perhaps the server was just slow that day?
It wouldn't suprise me though if I was away from the computer and just never noticed the initial population of the cache though.
I also have issues with very long downloading time when bootstrapping. With a cursory glance, I saw that tectonic is using reqwest
but it appears it is creating a reqwest::Client
for every connection; reqwest::Client
holds a connection pool internally, and should be reused as much as possible; reusing connections might speed up downloading many files quite a bit.
@rasky I don't see that, there are 3 Client::new()
calls corresponding to these 3 phases
note: downloading index
note: downloading SHA256SUM
note: downloading tectonic-format-latex.tex
.... etc
during the last phase after the initial "downloading tectonic-format-latex.tex", all are sharing a reqwest::Client. but it's this phase which is taking the majority of the time it seems.
I would like to put in another endorsement for reducing the number of HTTP requests as much as possible. The overhead is huge in many cases! I have an ethernet connection that can download at 100MB/s, but because every file is fetched separately, it takes 0.75s/file -- and these are tiny files, so the effective download rate is ~50KB/s, or in other words over a thousand times slower due to making many connections. If requests absolutely must be made separately, we could still get a great performance improvement by doing them in parallel.
Perhaps Peter can clarify, but
As far as I can tell, the problem is that it's downloading while parsing, as in it stops parsing, downloads the file, and continues parsing when it has received the data. The data received can then cause it to download more files, etc.
I think that some solution to fixing this might need more data on the server e.g. parse through all the files, and create a graph of each file's dependencies, so that you can download a table of contents, and at then download batches of files and their dependencies in parallel.
But I don't think the client itself has enough information to be able to continue parsing to actually be able to encounter another file so it can start downloading it in parallel without some help.
@ratmice That's definitely a factor, but should be surmountable. Tectonic could keep a connection to the server open in the background and try to reuse it as much as possible. Servers might aggressively drop idle connections, but the worst case is that you just reconnect as we often do now.
You could map out dependencies of various files to a certain extent, but those dependencies might vary depending on the engine state and characteristics of the input document, so they're not well-defined.
Just to be clear, I have never really sat down and understood how HTTP connection reuse is or isn't actually working out in Tectonic in practice, so it is quite possible that there are some easy wins to be had! I would love to push on this topic but haven't been able to sit down and have a good coding session for a long time.
@pkgw I've looked at it a little bit, and afaict connection reuse should be working for the vast majority of connections,
The connection pool should be set up in cached_itarbundle.rs HttpRangeReader::new()
,
There is a constant number of initial connections, but the majority of connections under
get_everything
reuse the HttpRangeReader
, and thus should reuse reqwest's internal connection pooling.
With the version of reqwest we are using, the timeout seems to default to 30s which seems reasonable. I haven't been able to sit down and look at reqwest's internals however to see if the pooling mechanism is actually working.
But from the tectonic side, I didn't see anything egregious, but will try and look into the reqwest internal pooling when I can, perhaps it might be closing the connection when the number of outstanding requests reaches 0 which would happen after every request.
This was unexpected to me, I have a texlive installation but suddenly today when I executed tectonic it started to download a bunch of sty files. Before auto downloading anything I'd really recommend you to prompt the user first whether it wishes these files to be downloaded. Currently I'm looking up where these files have been actually downloaded, as I don't need them.
Now tectonic
simply started failing to download whatever it tries to download due to DNS error. Please, is there any way to force it stop trying to do magic and see that I have TeXLive already installed? Otherwise the tool has just become a brick, it can't do anything useful...