DifferentialEquations.jl icon indicating copy to clipboard operation
DifferentialEquations.jl copied to clipboard

Use `julia-actions/cache` in CI

Open rikhuijzer opened this issue 6 months ago • 12 comments

Thanks to Ian Butterworth, julia-actions/cache caches ~/.julia/compiled too (https://github.com/julia-actions/cache/pull/71).

Maybe interesting for the SciML ecosystem.

rikhuijzer avatar Dec 01 '23 10:12 rikhuijzer

I've set fail_fast: false in 3aec1fb so that GitHub doesn't automatically cancel all runs if one fails.

rikhuijzer avatar Dec 01 '23 13:12 rikhuijzer

I think generally in SciML it's better to use fail_fast: true since there's already a very high computational burden due to CI in SciML and usually a large number of CI jobs. So IMO everything that fails should fail immediately.

devmotion avatar Dec 01 '23 13:12 devmotion

I think generally in SciML it's better to use fail_fast: true since there's already a very high computational burden due to CI in SciML and usually a large number of CI jobs. So IMO everything that fails should fail immediately.

I think that makes sense if you assume developer time has little value

rikhuijzer avatar Dec 01 '23 15:12 rikhuijzer

What is this caching doing? Before and after?

ChrisRackauckas avatar Dec 01 '23 16:12 ChrisRackauckas

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (4f8e26f) 86.09% compared to head (94a7c6e) 86.09%. Report is 2 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1004   +/-   ##
=======================================
  Coverage   86.09%   86.09%           
=======================================
  Files          11       11           
  Lines         151      151           
=======================================
  Hits          130      130           
  Misses         21       21           

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Dec 01 '23 16:12 codecov[bot]

Bump

ChrisRackauckas avatar Dec 23 '23 17:12 ChrisRackauckas

What is this caching doing? Before and after?

Simply put, it is similar to what DifferentialEquations.jl had but then the complexity is moved inside https://github.com/julia-actions/cache. This allowed cache to come up with some improvements over time including the caching of the compiled directory so that precompiled binaries can be re-used between jobs, and caching of packages. Before this patch, DifferentialEquations.jl only cached ~/.julia/artifacts.

rikhuijzer avatar Dec 23 '23 18:12 rikhuijzer

I think generally in SciML it's better to use fail_fast: true since there's already a very high computational burden due to CI in SciML and usually a large number of CI jobs. So IMO everything that fails should fail immediately.

I think that makes sense if you assume developer time has little value

I said this because at Pluto.jl there was a lot of developer time wasted by this. What would happen there is that about a dozen jobs that took about 20 minutes would be cancelled as soon as one job started to fail. In practice, this was problematic because often most jobs were about 80% done with their execution when they were cancelled. By cancelling, this 80% progress was thrown away. Sometimes, this was no problem because the problem was obvious, but sometimes the problem was not obvious and having more information about which jobs did pass could be very helpful in debugging.

rikhuijzer avatar Dec 23 '23 18:12 rikhuijzer

I said this because at Pluto.jl there was a lot of developer time wasted by this. What would happen there is that about a dozen jobs that took about 20 minutes would be cancelled as soon as one job started to fail. In practice, this was problematic because often most jobs were about 80% done with their execution when they were cancelled. By cancelling, this 80% progress was thrown away. Sometimes, this was no problem because the problem was obvious, but sometimes the problem was not obvious and having more information about which jobs did pass could be very helpful in debugging.

Looking at the Pluto tests, those tests are quite short and light. In comparison, an OrdinaryDiffEq run https://github.com/SciML/OrdinaryDiffEq.jl/pull/2092 can easily take around 10 hours of compute time, split across jobs, but that's just a lot. Withs the tens of contributors active across SciML, if we're not actively canceling jobs it can take a few hours to get an open machine to start running tests, let alone finish. So fail fast generally makes tests run hours faster because the queue is no longer clogged and it makes them start faster. We're trying to secure more money for more resources but it's not easy to come by.

Simply put, it is similar to what DifferentialEquations.jl had but then the complexity is moved inside https://github.com/julia-actions/cache. This allowed cache to come up with some improvements over time including the caching of the compiled directory so that precompiled binaries can be re-used between jobs, and caching of packages. Before this patch, DifferentialEquations.jl only cached ~/.julia/artifacts.

Does this work for the way we have the groups setup in OrdinaryDiffEq?

ChrisRackauckas avatar Dec 23 '23 22:12 ChrisRackauckas

Is there a way to precompile once and then use that for all of the groups?

ChrisRackauckas avatar Dec 23 '23 22:12 ChrisRackauckas

Is there a way to precompile once and then use that for all of the groups?

I don’t know

rikhuijzer avatar Dec 23 '23 22:12 rikhuijzer

It doesn't look like this made it any faster? https://github.com/SciML/DifferentialEquations.jl/actions/runs/7332505023/job/19966734944?pr=1004#step:4:52

ChrisRackauckas avatar Dec 26 '23 20:12 ChrisRackauckas

Is this incorporated into setup-julia now or something? What's the reason to close?

ChrisRackauckas avatar Apr 16 '24 09:04 ChrisRackauckas

Is this incorporated into setup-julia now or something? What's the reason to close?

The close was unintentional. I was just removing old forks from GitHub. Sorry for the noise

rikhuijzer avatar Apr 16 '24 09:04 rikhuijzer

@thazhemadam will this kind of caching be addressed in the centralization changes?

ChrisRackauckas avatar Apr 16 '24 09:04 ChrisRackauckas

Yes, I was planning on having it available as a default, with an option to opt out.

thazhemadam avatar Apr 16 '24 11:04 thazhemadam