asset_compress
asset_compress copied to clipboard
hash should also depend on timestamp of files compressed
If the hash name of the compressed file depends on the date the file is modified, the new version will be not be cached by the client browser, and the new version always delivered...
The hash filenames don't depend on the timestamps. So I don't really understand the issue here. Perhaps you can give some more information?
I understand now, you are saying that the hash should depend on the mtime of the component files. I've been thinking of making the builds mtime sensitive so you don't spend time rebuilding assets that don't need to be. This would work well with that plan.
sorry not answering earlier.
yes, it is exactly that. the hash depending also on mtime would make asset_compress to always deliver the current version of the file. depending only on the name could make asset_compress to deliver an old version of a file.
Is this still an issue currently? I would find that useful for development - where you could then invalidate the cache, and have a valid current version upon next page hit.
My opinion is that the "file name hash" should not be a timestamp and instead should be a hash of the file contents. I rely on the file name to be a representation of the file contents in my cacheing.
I use a git-deploy workflow. As part of the deploy I throw away my last deployed version and checkout only the most recent version of my code which means any previously generated cached css/js files are thrown away. Also as i'm doing a git checkout the date modified are lost on my src files.
I'm setting my cached files to have a cache max-age of years in the future and relying on the hash to be an indication on whether the user will need to download a new version. Currently whenever I deploy the cache version becomes useless as a new timestamp is appended to the filename. This would easily be solved by making the "hash" a representation of the file contents.
I'm happy to implement these changes, just would like to know whether you would like me to? And whether you would like it as an optional configuration item or happen by default.
Thanks.
So instead of checking mtime you would be doing something like md5sum on each file?
Correct.
How would the hash for a build file be calculated? One nice thing with mtimes is that the cli builder can skip fresh builds with very little disk IO.
Hmm you're right. Hashing the files would be quite IO intensive compared to checking mtimes.
I was thinking of appending each src files content and hashing that.
However we are dealing with relatively small files which will be loaded and hashed very quickly. Also the application is being ran via the command line and unless there is a use case im not aware of, this tool is only really run on deploy which is infrequent.
I guess the question becomes is it worth the extra overhead for a hash to represent the files data over the mtime. Im not sure how many people would be affected by inaccurate mtimes like I am via git.
I could alternatively implement it as an optional feature.
I don't like optional features they end up confusing people and becoming bloat/broken over time. Why do the mtimes of your files become inaccurate when deploying from git? From what I can see git stores the mtime of files.
Good point.
Running: git clone https://github.com/markstory/asset_compress.git.
all mtimes on my local machine are of the current time.
Git does not preserve mtimes as described in the follow FAQ: https://git.wiki.kernel.org/index.php/Git_FAQ#Why_isn.27t_Git_preserving_modification_time_on_files.3F
Ah, I forgot to do a fresh clone - derp.
I have some concerns that using checksums will have other un-intended consequences. For example, the helper currently finds the most current build asset using glob() which will not be possible if files are 'timestampped' with checksums.
My understanding was that the helper read the "cake cache" item for builds otherwise it built a dev version without a version hash.
Would you mind putting me to the code you a referring to?
Looks like I was recalling how the code used to work. The current code doesn't use glob and works as you described.
I might fork off and implement this feature how I expect it to work (I need it for my own use anyhow) and come back and see what you think.
I understand you're very busy at the moment with CakePHP 3.
Thank you.
Sounds like a plan. I'm not against the idea of using hashes as they have some benefits over timestamps like git deploys, or bad rsync implementations.