tectonic
tectonic copied to clipboard
Bundle rework
Our current bundle scheme is causing quite a few bugs. This PR (paired with tectonic-typesetting/tectonic-texlive-bundles#14) resolves many of these by implementing proper search paths in bundles.
These changes are not be breaking. Old zip and bundles will work with this PR. (I have, however, removed the scripts that create them. They should eventually be deprecated.)
Changes in this PR:
- A new bundle format (see tectonic-typesetting/tectonic-texlive-bundles#14)
- File search paths (also documented in the bundle repo)
- Bundle autodetection from url (thus,
--web-bundlehas been removed, and--bundlenow handles both URLs and paths) - Bundle caching rework (mostly invisible to the user, enables cleaner code)
- Cache organization rework (the bundle cache needs fewer files and is in its own subdir)
- Many minor bundle-related cleanups and optimizations
Most changes in this PR are in the bundles crate. Changes in other files are usually minor side-effects thanks to a slightly different API.
Thanks for all this work! As I remarked in the texlive-bundles PR, I wasn't getting any notifications about your work on this topic. I hope to start reading over your PRs and understanding what you've been working on, although to be honest it will likely take me a lot longer to do so than I'd like.
Alright, I think I've implemented all the extra bundle features we need to make Tectonic's bundles actually work.
The code here is very close to done, except for a few minor problems (listed at the top) and whatever edits we come up with.
Thanks again for your work on this.
Will this new caching scheme work if multiple Tectonic processes are using and writing to the cache simultaneously? This scenario can easily happen if you have a work tree with several documents asnd Tectonic wired into a Make or Ninja build tool that runs multiple jobs at the same time.
At the moment, it looks like this PR is going to need updating to fix conflicts with #1132 . It will also need a pass to fix a bunch of Clippy complaints, and the test failure(s) of course. At the moment I see two tests failing on my laptop: test_cache_location_redirect and test_bundle_update. Unfortunately the Azure build logs have expired (and we can't get new build logs until the merge conflicts are resolved), so I don't know if that aligns with what the CI was finding.
Will this new caching scheme work if multiple Tectonic processes are using and writing to the cache simultaneously?
No clue. I'll look into this later. I'm not sure the previous cache was fully safe... It should be ok most of the time, since cache files are only read, but two processes downloading the same file at the same time could cause problems.
As for cleanup, I'll get to that eventually. Cleanup is the reason tectonic-typesetting/tectonic-texlive-bundles#14 is still in draft mode. This PR and that PR solve the same problem.
The previous cache did work in the parallel use case, thanks to three design choices:
- The cache overall is append-only
- Files are downloaded to a temporary path, and then atomically
rename()ed into their final locations, so that incomplete files are never observed in the cache - Information about downloaded files is logged into a line-oriented text file in append mode; the OS guarantees that lines may interleave but won't overwrite or clash with each other.
So if two processes try to download the same file, you might do extraneous work (download files twice, rename them twice, add two lines to the index file) but it will all be self-consistent and the resulting files will be well-structured.
Since it took me 25 years to finally learn this: the semantics of "append mode" on Unix are that "whenever you write to a file, the OS guarantees that it is as if you did a seek() to the end of the file immediately (and atomically) before the write occurs". The OS also guarantees that sufficiently small writes (~2k) are atomic. Between the two of those guarantees, if you have two processes appending to the same file in small chunks, you can be sure that you'll get both of their contributions without stepping on each others' toes.
Alright, that should be it. The new cache is multi-process-safe via renames, tests pass, and clippy errors have been fixed. Both this and tectonic-typesetting/tectonic-texlive-bundles#14 should be ready to merge.
Codecov Report
Attention: Patch coverage is 38.40000% with 462 lines in your changes are missing coverage. Please review.
Project coverage is 46.11%. Comparing base (
b3b66f2) to head (4d3f8b6).
Additional details and impacted files
@@ Coverage Diff @@
## master #1121 +/- ##
==========================================
- Coverage 46.40% 46.11% -0.30%
==========================================
Files 176 179 +3
Lines 65118 65413 +295
==========================================
- Hits 30221 30168 -53
- Misses 34897 35245 +348
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
CC @CraftSpider here too.
Hmmm A few tests are still failing on azure, not sure why. Everything passes on my machine.
Hmm, I'm not sure why this error would only be showing up in the cross-tests. The one thing that I can think of is that those run through the cross program, which filters the environment variables that propagate down to the subprograms that it runs ... but I don't see anything about this particular error that would depend on environment variables being set.
Just fixed #74. File names with spaces are now safe in ttbv1 bundles---but newlines will still cause errors.
I'm not sure that's a problem, though, since I've never seen a legitimate use case for files with newlines.
Here's an idea: should this new bundle format also include latex format files?
Now that the bundle-building code is in Rust, it shouldn't be too hard to generate format files while we build the bundle. (assuming format files are identical on all machines)
Would be nice to do that here, so that we only have one round of big, possibly disruptive changes.