vfsgen
vfsgen copied to clipboard
Caching file modification times.
Hi and thanks for the effort you have put into this excellent package.
One enhancement that I would find useful is the caching of file modification times. What I find is that if I run "make generate" on my project the generated vfsgen files can have different modification times for the files even if they have not changed. This is not devastating but I for one would find it handy if the generated vfsgen files did not change unless the input files were in fact different.
I was thinking that one way to achieve this might be to generate a JSON file that was keyed on file name and contained a digest of the file name and a modification time. If the file had not changed then the modification time was used. If the file had changed then the current time would be used and the JSON file would be updated accordingly:
{
"img/favicon.png": {
"md5": "123456789abcdef0fedcba987654321",
"mtime": "2015-11-14T18:56:32"
},
"js/app.js": {
"md5": "fb3456789abcdef0fedcba987654321",
"mtime": "2015-12-16T11:47:32"
}
}
Now I realise that this would be a significant addition of functionality and your contribution guidelines indicate that it might be considered out of scope for the package. So before I start serious work on a PR, could I ask for your thoughts on this idea?
Hi @jjeffery, thanks for your feedback on vfsgen, and for and creating this issue!
I'd like to discuss this to better understand the problem and solution space, before we get to implementation details.
vfsgen fully supports and respects the modification times in the source virtual filesystem (VFS), and preserves them in the output VFS. Running it multiple times on same input should produce same output (i.e., it's deterministic). It even normalizes the timezone as UTC, so that people running vfsgen in different parts of the world (i.e., where their local timezone is different) would still produce same output.
So, I want to learn more about this:
What I find is that if I run "make generate" on my project the generated vfsgen files can have different modification times for the files even if they have not changed.
What does your make generate
do? Why do the modification times change, if the files do not change?
Thanks for the reply @shurcooL. I'm keen to discuss and see if perhaps there is a better way to achieve what I am trying to do. Perhaps there is.
Apologies for make generate
, I meant to say go generate
.
As you say, vfsgen does respect the file modification time, and if that does not change then the output of vfsgen is consistent. I take your point about converting to UTC and notice that I omitted to put a trailing "Z" in my example timestamps above, which would be necessary.
Where I run into issues is when the modification time changes but nothing else changes. In my experience the most common examples of this are:
- Source code being checked out to different machines will have different modification times
- CI server that checks out the source from Git each time will have a different modification time for each build run
In these cases I would like vfsgen to recognize that the MD5 (or SHA1, whatever) is the same and assign a previously allocated modification time so that the generated code matches exactly.
The attached screenshot might help explain. I have just checked out one of our repos and run go generate
. The new code uses the date/time that I am writing this (2017-02-02 01:46:09Z) because that is the modification time of the files in the code repo that I have just cloned.
As an aside, I'm not sure why the previous version is showing fractions of a second and the recently generated code shows to second precision only. As far as I can see the code is just using MarshalText
so that's a small mystery at this stage. I'll have to check this out when I'm near a Windows machine, but I suspect that if you run vfsgen
on Windows you get a modification time to fractions of a second, but on Unix systems the modification time resolution is one second.
(Sorry for the delay; I lost track of this issue and forgot to get back to you.)
Thanks for providing that info.
Where I run into issues is when the modification time changes but nothing else changes. In my experience the most common examples of this are:
- Source code being checked out to different machines will have different modification times
- CI server that checks out the source from Git each time will have a different modification time for each build run
I see. That makes sense. I run into the source code being checked out causing modtimes to change on disk occasionally too.
In these cases I would like vfsgen to recognize that the MD5 (or SHA1, whatever) is the same and assign a previously allocated modification time so that the generated code matches exactly.
Right, so vfsgen
currently has no input aside from the input VFS itself, where it sources the modtimes from. If I understand your proposed solution, it's to add an optional cache that would override the modtimes from input VFS if the cached hashes (and filesizes, I presume) match, right?
Question about that approach, where do you anticipate the cache would be stored? Imagine if I check something in, and another person pulls the same repo on a different computer. If the cache isn't a part of the repo, they're unlikely to be able to access it, right? Also, CI servers are probably not going to get a copy of your cache either.
Here's what I'm thinking so far. The whole approach of potentially using a cache is something that happens before vfsgen produces the output. It's pretty orthogonal from the actual generation process and can be cleanly abstracted out. The code that does it can be a part of vfsgen, or it can be a part of a separate tool.
However, for prototype purposes, I think your best bet would be to prototype it as a separate tool from vfsgen. Here's what you can do, create a tool that caches modtimes and hashes, and then give it the ability to "restore" old cached modtimes. It would literally write to disk and change modtimes of files on disk to the saved values. Then, vfsgen would pick up those new modtimes that are the values you want.
How does such an approach sound to you?
In the end, I think that caching modtimes is just one potential approach to solve this overarching issue, and perhaps there's other ways. I'm open to keep thinking about this problem.
It would literally write to disk and change modtimes of files on disk to the saved values.
Just to follow up that thought, such code/tool doesn't have to write to disk necessarily, it can simply provide a VFS wrapper that overrides modtimes of the underlying source VFS.
Thanks for taking the time to consider this. You are absolutely correct in saying that our hypothetically modified vfsgen
needs something additional in order to remember the computed hashes, file sizes, etc, and if it is to work the way I am wanting it to, that storage has to be stored in source control in order for it to be useful.
My initial implementation thought was to introduce a file with a special name, perhaps _vfsgen.json
. If that file was present in the root directory of the VFS, then its contents would be read, and updated if any of the files had actually changed. If the file was not present, then vfsgen
would operate exactly as it does today.
I've already included a pretty simple example of what the contents of _vfsgen.json
might look in my initial comment. (Yes, I think that including the file size would be a good idea).
I can well understand that magic filenames are not to everyone's taste, alternative implementations might be to include an additional command line argument to vfsgen
.
Your point about this being an orthogonal concern to the main job of vfsgen
is absolutely correct, and yes, I should probably just prototype this as a separate utility. My go generate
steps are then in two steps, which should be no problem.
I guess that the only real motivation for including this in vfsgen
would be if this were a common enough problem for all of its users. The convenience of having it built-in would obviate the need for another go get
when setting up the project.
Never mind, I'm getting the impression that perhaps it is not something anyone else has had issue with. Feel free to close this issue, and I'll put together something that restores the timestamps prior to generating with vfsgen
. (The wrapper idea is also a good one -- thanks very much. I'll experiment with that idea).
Sounds good, thanks!
As I said in my previous comment, I'm okay to keep this open with thinking
label and see track any progress we make towards resolving the problem, or at least helping it in some ways.
It'd be great if you could post any updates about your experimentation and success, and tell me how well it works, etc.
If in the end it's something that's worthwhile to integrate into vfsgen, meaning it provides enough value to justify the increased complexity, I would certainly consider integrating it.
That said, I really want to think more and hopefully come up with a simpler solution.
Nice package! Though I've not used it in any app yet. It would be nice if HTTP caching headers could be provided automatically (ETag
, Cache-Control
and other headers in the league as described - for example - here).
We should be overwrite default behavior or disable it - could be just a function that has access to file body and a KV-Store of file/info,header.
Thanks @dc0d. This issue is about some sort caching of file modify dates, for files that are not modified between invocations of vfsgen. Not app-layer HTTP caching. Although the two might be related in some ways.
Still, if you'd like to discuss that, it's better to open a separate issue about that, to avoid crosstalk.
My mistake; sorry.
No problem! If you create that issue, we can discuss it further there.
if this were a common enough problem for all of its users
I also think this would be very useful, as I plan on checking in the go generated file into source control, it would be nice to have a build script that is idempotent, and as such be able to run go genenerate
, then run git ls-files -m
and error if any files have changed.
@shurcooL what's the usefulness of saving the modtime? Maybe this could be added to options and set using a static value?
Assuming that file modification time doesn't actually matter, and only the content of the file matters, the above PR allows a user to set the filemod time to the the time zero value.
cc @jjeffery
what's the usefulness of saving the modtime?
Mostly to enable HTTP caching. When files have accurate modification times, and a request comes in with a If-Modified-Since: Thu, 05 Jul 2012 15:31:30 GMT
header, then it's possible to respond with "304 Not Modified" if a file hasn't been modified since that time.
it would be nice to have a build script that is idempotent, and as such be able to run
go genenerate
, then rungit ls-files -m
and error if any files have changed.
I agree that'd be nice, but I still don't know of a great way of making that possible.
I needed Cache-Control
headers so created this package. Not sure is it's best in the market :) but it works - might be useful for implementing those headers in case those are going to be added.
In the stated use case where there is say assets_vfsdata.go
that is committed to version control, then that means the necessary info is already contained in that file. I don't think a separate cache is necessary.