tectonic icon indicating copy to clipboard operation
tectonic copied to clipboard

A more robust manifest format

Open rekka opened this issue 7 years ago • 10 comments

Currently the manifest format for the local cache is stored in a space-separated format without any escaping. This leads to invalid entries being inserted if for instance files with spaces are requested:

\input{"with space.tex"}\bye

produces entries

with space.tex 0 -
with space.tex.tex 0 -

It is not really a serious issue at the moment, but a future-proofing of the format is desirable.

I believe that one appropriate solution would be to use the csv crate, which also allows for space-separated entries. So it should be backwards compatible with the current format. It also offers easy parsing using serde. Moreover, it will most likely hit version 1 soon.

I am happy to implement this.

rekka avatar Jun 06 '17 03:06 rekka

Yes, it would be good to tighten things up here.

I might prefer to stick with the current format but just reverse the ordering: size, SHA256, filename. That way you can just split on the first two spaces and take the rest as a lump.

Although I guess we're still fragile if anyone uses a filename containing a newline ...

pkgw avatar Jun 06 '17 12:06 pkgw

That is actually a pretty good solution too. Now files with newlines...

rekka avatar Jun 06 '17 14:06 rekka

I was just browsing through looking at this, and poked around until I found the relevant parts of the code. It looks like src/io/local_cache:176 is where it gets written; I'm not sure where it gets read from the manifest.

alexander-bauer avatar Jun 06 '17 23:06 alexander-bauer

It is a bit above that, in the new method.

rekka avatar Jun 06 '17 23:06 rekka

@rekka I can go ahead and make a PR for this, but I'd also hate to duplicate work, if you'd like to

alexander-bauer avatar Jun 06 '17 23:06 alexander-bauer

I'm not currently working on this, but I think we should reach a consensus with @pkgw on what the best approach here is.

I am not leaning towards a simple format as suggested by @pkgw with: size SHA256 filename, and completely disallowing files with new lines to enter the manifest. This can be easily parsed by str::splitn method.

rekka avatar Jun 07 '17 00:06 rekka

@rekka Do you perhaps mean "now" and not "not" in your most recent comment?

I think this is a good change but I am a little bit concerned about how to transition from an old manifest format to a new one. I guess we can just munge a version code into the file name, which seems reasonably future-proof if we want to make any other changes to the format.

pkgw avatar Jun 07 '17 12:06 pkgw

Ah, sorry about that, I meant "not", not working on this.

A version code is a good option. I suppose you mean that tectonic would just discard an outdated manifest and rebuild the cache from scratch. That's simple.

rekka avatar Jun 07 '17 12:06 rekka

Yes, re-downloading files is a bit annoying but it seems like the lowest-pain option. I don't want people to have to be told to manually delete their cache directories if I can avoid it.

pkgw avatar Jun 07 '17 12:06 pkgw

An option that's a little less rough on the UX would be to migrate from the older manifest version automatically, but I suspect that maintaining the code for that (and all future manifest improvements) would be daunting.

Perhaps the UX problem could be offset (if it needs to be at all) by printing an explanatory message when creating the new manifest if an old one is already present?

alexander-bauer avatar Jun 07 '17 15:06 alexander-bauer