actions icon indicating copy to clipboard operation
actions copied to clipboard

Save a warmed R package cache for faster installs in failing Actions

Open schloerke opened this issue 1 year ago • 15 comments

Goal

When restoring the R package cache, if the primary key is not used, save the currently installed R packages to use as a warmed cache for a followup Action.

If the primary key is used to restore the R package cache, then no secondary cache is saved as there will be a cache hit for followup Actions.

Motivation

actions/cache@v3 is not saved if the Action fails. To me, this is incorrect when installing R packages. The previously installed packages should be available for faster installation in a followup Action.

Cons

  • The secondary cache will not save packages that are installed after calling r-lib/actions/setup-r-dependencies
  • Uses a second cache which will take up more cache storage space

Other approaches

This was not possible before actions/cache had the tag v3.

To remove the cons above, we could

  • Restore package cache with actions/cache/restore@v3
  • then install the R packages
  • then cache the R packages with actions/cache/save@v3

However, packages installed outside of r-lib/actions/setup-r-dependencies will not be saved with this approach. While not common, rstudio/shinycoreci installs packages after the initial dependency installation and would benefit from caching all installed packages when the Action is complete via actions/cache@v3.

If the Actions are successful, the secondary cache will not be saved or utilized after the initial passing Action.

Example usage

  • Saving the cache on a failing run
    • https://github.com/rstudio/htmltools/actions/runs/3943533959/jobs/6768411135#step:3:2930
  • Saving the cache on the initial successful run
    • https://github.com/rstudio/htmltools/actions/runs/3943533959/jobs/6768411847#step:3:4328
  • No cache being saved when the primary key is hit
    • There should be a call to Run actions/cache/save@v3 just before Run # Session info
    • https://github.com/rstudio/htmltools/actions/runs/3943533959/jobs/6768489863#step:3:3979

Update: 01/20

Using actions/cache/restore@v3 and actions/cache/save@v3 to restore and save the package library.

Benefit:

  • Cache is warmed on failing runs
  • (Same amount of cache storage)
  • (No confusing message when the cache is tried to be saved twice on an initial successful Action)

Does not capture packages that were installed after the initial installation, but those packages are NOT contained within the cache key. So we should not worry about them.

schloerke avatar Jan 18 '23 21:01 schloerke

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 83.33%. Comparing base (514fd75) to head (1159dc9).

:exclamation: Current head 1159dc9 differs from pull request most recent head 6d13c6a. Consider uploading reports for the commit 6d13c6a to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           v2-branch     #695   +/-   ##
==========================================
  Coverage      83.33%   83.33%           
==========================================
  Files              3        3           
  Lines             12       12           
==========================================
  Hits              10       10           
  Misses             2        2           

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Jan 18 '23 21:01 codecov-commenter

If the cache is not being updated on a successful Action given a cache hit, I think we should use the actions/cache/restore@v3.

This would get rid of the message and gives a clean solution.

rstudio/shinycoreci can worry about itself later on. I'd rather have a clean/non-confusing solution.

** Implementing with actions/cache/restore@v3


Do you know how to save a multiline step output to $GITHUB_OUTPUT?

I'd like to save the cache path info and use it as an output value.

schloerke avatar Jan 20 '23 21:01 schloerke

Request to pass the path value from the restore step to be able to use it in the save step: https://github.com/actions/cache/issues/1082

schloerke avatar Jan 20 '23 22:01 schloerke

@schloerke Thanks for this! Great timing - I was just exploring the same issue.

dan-knight avatar Jan 25 '23 22:01 dan-knight

The drawback of this is that sometimes the reason for failure is botched packages installation, and we would save a botched cache, that will be potentially picked up across all branches. We have seen this happen for example for mis-specified RSPM repos. So I would be more comfortable if this was opt-in.

gaborcsardi avatar Apr 01 '23 15:04 gaborcsardi

The drawback of this is that sometimes the reason for failure is botched packages installation

Do you mean that the restore step works, but then the package installation fails? If this is the case, nothing would be saved.

Or do you mean that a bad package is installed successfully but during package checking, it fails? In this situation, the bad package would be cached for future restores.


I would be more comfortable if this was opt-in.

I do not believe a step's uses field can be dynamic. So this would require a different action file completely. 🙁 It would be very nice if we could avoid duplicating the installation code.

If it can be dynamic, I'm happy with it being opt-in.

schloerke avatar Apr 02 '23 02:04 schloerke

Or do you mean that a bad package is installed successfully but during package checking, it fails? In this situation, the bad package would be cached for future restores.

Yes.

If it can be dynamic, I'm happy with it being opt-in.

You can have alternative cache steps depending on an input parameter.

gaborcsardi avatar Apr 02 '23 07:04 gaborcsardi

If it can be dynamic, I'm happy with it being opt-in.

You can have alternative cache steps depending on an input parameter.

Will do!


  • [x] Implement different cache steps based on input parameter

schloerke avatar Apr 04 '23 18:04 schloerke

(Rebased and squashed to remove the long-winded commit history)

schloerke avatar May 02 '23 19:05 schloerke

If during packing installation, support was added for removing packages that are installed but not contained within the lock file, then it would fit with this approach nicely. This way, the library would never balloon with unnecessary packages.

  • [ ] Accept this PR once PR adds ability to remove packages not contained within lock file

schloerke avatar Feb 14 '24 15:02 schloerke

Will request a review when the PR is stable. Thank you!

schloerke avatar Feb 14 '24 17:02 schloerke

Will request a review when the PR is stable. Thank you!

I wonder if it was better and easier to implement this is in pak itself.

gaborcsardi avatar Feb 14 '24 17:02 gaborcsardi

Will request a review when the PR is stable. Thank you!

I wonder if it was better and easier to implement this is in pak itself.

It seems fair to implement it all the way in pkgdepends in the $install() method. Where a config value could be prune=TRUE to remove the packages. (But that seemed like a lot of up hill work for me.)


If I can show the PR w/ prune works, I'd be happy to have you relocate it anywhere that makes more sense.

schloerke avatar Feb 14 '24 17:02 schloerke

Proof of pruning packages: https://github.com/ggobi/ggally/actions/runs/7905722423/job/21578966735?pr=493#step:3:7274

Run # Prune packages
Prune unnecessary packages
  Removing unnecessary packages:  httpuv, later, promises, shiny, sourcetools, xtable 
  Removing packages from ‘/home/runner/work/_temp/Library’
  (as ‘lib’ is unspecified)

Proof of not pruning packages: https://github.com/ggobi/ggally/actions/runs/7906986262/job/21583015846?pr=493#step:3:7277

Run # Prune packages
Prune unnecessary packages
  No unnecessary packages to remove

schloerke avatar Feb 14 '24 20:02 schloerke

@gaborcsardi I'm happy to have you relocate the code to the package you think is best. The addition of pruning packages that are not defined in the lock file makes me feel safe when using a cache that will most likely be restored and should always be saved. This prevents the R library from being poisoned from packages installed from earlier runs.

If users do not want to prune the library, they can opt-out. (Pruning only happens when cache="always", which is opt-in.)

schloerke avatar Feb 14 '24 20:02 schloerke

  • [x] Update to use actions/cache@v4 as it now has a save-always: flag. Related: https://github.com/actions/cache/commit/b1378c8403e6779ec61a229f419bc68c0e118f02

schloerke avatar Mar 28 '24 14:03 schloerke

Update to use actions/cache@v4 as it now has a save-always: flag. Related: https://github.com/actions/cache/commit/b1378c8403e6779ec61a229f419bc68c0e118f02

Oh, that is awesome!

gaborcsardi avatar Mar 28 '24 15:03 gaborcsardi

Thank you! ✅

schloerke avatar May 08 '24 13:05 schloerke

Thank you, and sorry for the long wait!

gaborcsardi avatar May 08 '24 13:05 gaborcsardi

@gaborcsardi If you could update the v2 tag (and others?) for the repo, that'd be great! It'll allow me to delete my branch and have a consistent step location. Thank you!

schloerke avatar May 08 '24 14:05 schloerke

Yep, I am preparing a release, either today or tomorrow.

gaborcsardi avatar May 08 '24 14:05 gaborcsardi

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue and include a link to this pull request.

github-actions[bot] avatar May 23 '24 01:05 github-actions[bot]