actions
actions copied to clipboard
Save a warmed R package cache for faster installs in failing Actions
Goal
When restoring the R package cache, if the primary key is not used, save the currently installed R packages to use as a warmed cache for a followup Action.
If the primary key is used to restore the R package cache, then no secondary cache is saved as there will be a cache hit for followup Actions.
Motivation
actions/cache@v3
is not saved if the Action fails. To me, this is incorrect when installing R packages. The previously installed packages should be available for faster installation in a followup Action.
Cons
- The secondary cache will not save packages that are installed after calling
r-lib/actions/setup-r-dependencies
- Uses a second cache which will take up more cache storage space
Other approaches
This was not possible before actions/cache
had the tag v3
.
To remove the cons above, we could
- Restore package cache with
actions/cache/restore@v3
- then install the R packages
- then cache the R packages with
actions/cache/save@v3
However, packages installed outside of r-lib/actions/setup-r-dependencies
will not be saved with this approach.
While not common, rstudio/shinycoreci
installs packages after the initial dependency installation and would benefit from caching all installed packages when the Action is complete via actions/cache@v3
.
If the Actions are successful, the secondary cache will not be saved or utilized after the initial passing Action.
Example usage
- Saving the cache on a failing run
- https://github.com/rstudio/htmltools/actions/runs/3943533959/jobs/6768411135#step:3:2930
- Saving the cache on the initial successful run
- https://github.com/rstudio/htmltools/actions/runs/3943533959/jobs/6768411847#step:3:4328
- No cache being saved when the primary key is hit
- There should be a call to
Run actions/cache/save@v3
just beforeRun # Session info
- https://github.com/rstudio/htmltools/actions/runs/3943533959/jobs/6768489863#step:3:3979
- There should be a call to
Update: 01/20
Using actions/cache/restore@v3
and actions/cache/save@v3
to restore and save the package library.
Benefit:
- Cache is warmed on failing runs
- (Same amount of cache storage)
- (No confusing message when the cache is tried to be saved twice on an initial successful Action)
Does not capture packages that were installed after the initial installation, but those packages are NOT contained within the cache key. So we should not worry about them.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 83.33%. Comparing base (
514fd75
) to head (1159dc9
).
:exclamation: Current head 1159dc9 differs from pull request most recent head 6d13c6a. Consider uploading reports for the commit 6d13c6a to get more accurate results
Additional details and impacted files
@@ Coverage Diff @@
## v2-branch #695 +/- ##
==========================================
Coverage 83.33% 83.33%
==========================================
Files 3 3
Lines 12 12
==========================================
Hits 10 10
Misses 2 2
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
If the cache is not being updated on a successful Action given a cache hit, I think we should use the actions/cache/restore@v3
.
This would get rid of the message and gives a clean solution.
rstudio/shinycoreci
can worry about itself later on. I'd rather have a clean/non-confusing solution.
** Implementing with actions/cache/restore@v3
Do you know how to save a multiline step output to $GITHUB_OUTPUT
?
I'd like to save the cache path
info and use it as an output value.
Request to pass the path
value from the restore step to be able to use it in the save step: https://github.com/actions/cache/issues/1082
@schloerke Thanks for this! Great timing - I was just exploring the same issue.
The drawback of this is that sometimes the reason for failure is botched packages installation, and we would save a botched cache, that will be potentially picked up across all branches. We have seen this happen for example for mis-specified RSPM repos. So I would be more comfortable if this was opt-in.
The drawback of this is that sometimes the reason for failure is botched packages installation
Do you mean that the restore step works, but then the package installation fails? If this is the case, nothing would be saved.
Or do you mean that a bad package is installed successfully but during package checking, it fails? In this situation, the bad package would be cached for future restores.
I would be more comfortable if this was opt-in.
I do not believe a step
's uses
field can be dynamic. So this would require a different action file completely. 🙁 It would be very nice if we could avoid duplicating the installation code.
If it can be dynamic, I'm happy with it being opt-in.
Or do you mean that a bad package is installed successfully but during package checking, it fails? In this situation, the bad package would be cached for future restores.
Yes.
If it can be dynamic, I'm happy with it being opt-in.
You can have alternative cache steps depending on an input parameter.
If it can be dynamic, I'm happy with it being opt-in.
You can have alternative cache steps depending on an input parameter.
Will do!
- [x] Implement different cache steps based on input parameter
(Rebased and squashed to remove the long-winded commit history)
If during packing installation, support was added for removing packages that are installed but not contained within the lock file, then it would fit with this approach nicely. This way, the library would never balloon with unnecessary packages.
- [ ] Accept this PR once PR adds ability to remove packages not contained within lock file
Will request a review when the PR is stable. Thank you!
Will request a review when the PR is stable. Thank you!
I wonder if it was better and easier to implement this is in pak itself.
Will request a review when the PR is stable. Thank you!
I wonder if it was better and easier to implement this is in pak itself.
It seems fair to implement it all the way in pkgdepends in the $install()
method. Where a config value could be prune=TRUE
to remove the packages. (But that seemed like a lot of up hill work for me.)
If I can show the PR w/ prune works, I'd be happy to have you relocate it anywhere that makes more sense.
Proof of pruning packages: https://github.com/ggobi/ggally/actions/runs/7905722423/job/21578966735?pr=493#step:3:7274
Run # Prune packages
Prune unnecessary packages
Removing unnecessary packages: httpuv, later, promises, shiny, sourcetools, xtable
Removing packages from ‘/home/runner/work/_temp/Library’
(as ‘lib’ is unspecified)
Proof of not pruning packages: https://github.com/ggobi/ggally/actions/runs/7906986262/job/21583015846?pr=493#step:3:7277
Run # Prune packages
Prune unnecessary packages
No unnecessary packages to remove
@gaborcsardi I'm happy to have you relocate the code to the package you think is best. The addition of pruning packages that are not defined in the lock file makes me feel safe when using a cache that will most likely be restored and should always be saved. This prevents the R library from being poisoned from packages installed from earlier runs.
If users do not want to prune the library, they can opt-out. (Pruning only happens when cache="always"
, which is opt-in.)
- [x] Update to use
actions/cache@v4
as it now has asave-always:
flag. Related: https://github.com/actions/cache/commit/b1378c8403e6779ec61a229f419bc68c0e118f02
Update to use actions/cache@v4 as it now has a save-always: flag. Related: https://github.com/actions/cache/commit/b1378c8403e6779ec61a229f419bc68c0e118f02
Oh, that is awesome!
Thank you! ✅
Thank you, and sorry for the long wait!
@gaborcsardi If you could update the v2
tag (and others?) for the repo, that'd be great! It'll allow me to delete my branch and have a consistent step location. Thank you!
Yep, I am preparing a release, either today or tomorrow.
This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue and include a link to this pull request.