Bootsnap compile cache isn't as useful as it could be on CI and production
Problem 1: cache keys are mtime based
Bootsnap's compile cache is very efficient for development workflows, but on CI or production, it becomes almost unusable.
The main reason is that git (and most other VCS) don't store mtime, so on CI or production, unless your setup manage to preserve mtime, all the compile cache will be invalidated. And most CI / production systems start from a fresh clone of the repository.
The solution to this would be to use file digests instead of mtime, of course hashing a source file is slower than just accessing the mtime, but compared to parsing the Ruby source file, fast hash functions would still offer a major speed up.
Problem 2: the cache isn't self cleaning
The compile cache entries are stored based on the path of the source file. e.g. the cache for path/to.rb will be stored in <cache-dir>/<fnv1a_64(path)>. So if you keep persisting the cache between CI builds or production deploys, over time as you delete some source files, update gems etc, new entries will be created, but outdated ones won't be removed, which might lead to a very bloated cache.
Hence why we have a note in the README about regular flushing on the cache.
And the problem can be even worse with some deploy methods like capistrano, with which the real path of the source files change on every deploy.
So even if we were to fix the mtime issue, we'd need to address cache GC otherwise users would run into big troubles.
Here I'm not too sure what the best solution could be, but I have a few ideas
Solution 2.1: Splitting the cache
Assuming the biggest source of cache garbage is gem upgrades, we could have one compile cache directory per gem, e.g. we could store cache for $GEM_ROOT/gems/my-gem-1.2.3/lib/my-gem.rb in $GEM_ROOT/gems/my-gem-1.2.3/.bootsnap/<fnv1a_64(path)>, or even $GEM_ROOT/gems/my-gem-1.2.3/lib/my-gem.rb.bootsnap.
This way when you upgrade or remove a gem you automatically get rid of the old cache.
However:
- This is assuming the gem directory is writable, that's not always the case.
- It requires to lookup the gem root directory, which might be costly (unless we use the second path format)
I think that if we were to implement this, the vast majority of the GC problem would be solved, as path changes insides the application are much less likely to be frequent enough to produce the problem unless you keep the cache for a very long time.
Solution 2.2: bootsnap precompile --clean
This is much less of a general solution as I don't think is is likely that a large portion of users would integrate bootsnap precompile in their workflow, but in theory we could have it clean the outdated cache entries. Since it will go over all the source files to precompile them, it can make a list of up to date cache entries and delete the rest.
Thoughts
This two changes aren't necessarily that hard to implement, but they are a quite important change, likely justifying a major version bump. So rather than to start writing PRs head on, I'd like to have some feedback on the idea.
@burke I saw you removed yourself from the CODEOWNERS, but if you have a bit of time your insights here would be more than welcome.
@rafaelfranca @DazWorrall I think you may have opinions or hindsights on this.
For problem 1 I think we should make the cache invalidation configurable so we can keep mtime in development.
For problem 2 I think the solution 2.2 is the less invasive and less prone to problems and less likely to add runtime penalty.
I think we should make the cache invalidation configurable so we can keep
mtimein development.
I think we should first measure how much slower the digest based cache would be.
I think the solution 2.2 is the less invasive and less prone to problems and less likely to add runtime penalty.
The two proposed solutions weren't exclusive. My issue with 2.2 is that it is harder to integrate.
For Problem 1 could we use something like https://github.com/rosylilly to restore the mtime from the latest github commit? @esnunes tried it locally and it was quite slow when going over all files but we could apply it on a subset of files.
could we use something like
I suppose you mean: https://github.com/rosylilly/git-set-mtime
it was quite slow when going over all files
that's what I'd expect yes.
but we could apply it on a subset of files.
I don't think there is any subset, bootsnap has one entry for every single ruby source file. You might save a bit by skipping various text files etc, but it's not as reliable.
Also note that this issue is not just a Shopify thing, the idea is to improve the situation for all users, not just us.
It could be configurable to use the get set mtime so it could be used for CI environments. How would the speed compare to to using file digests?
Does it sound like this issue is the same as:
https://github.com/heroku/heroku-buildpack-ruby/issues/979
I originally thought that this issue was due to a different dir at boot and runtime. But I tried to reproduce that issue locally and it looks like moving an app after the cache is generated works just fine. I'm not sure what conditions reproduce it other than deploying on Heroku.
Does it sound like this issue is the same as:
Not quite. The issue I'm describing is not supposed to make the cache grow. But the issue you link is another interesting thing. Since bootsnap use realpaths, with buildpacks moving the code around I suppose even the cache generated by assets:precompile can't be re-used.