ClimaLand.jl icon indicating copy to clipboard operation
ClimaLand.jl copied to clipboard

Benchmark on buildkite is unpredictable

Open imreddyTeja opened this issue 1 year ago • 2 comments

Describe the bug

  • at one point the land.jl benchmark was failing with no error or test failure message
  • it seems like it was hitting the time limit of the slurm job because it was getting caught in CUDA profiling.
  • This was also an issue with the Richards benchmark
  • After trying again a few days later, benchmarking the same commits passed
  • As of Nov 4th, the benchmarks mostly worked, but sometimes one will still get hung up CUDA profiling

To Reproduce

Run a benchmark by adding the "Run Benchmarks Tag"

Or even better, add it as a unit test, and open pull request.

Project

If not using the `examples` project: ``` paste your Project.toml here. ``` ``` paste your Manifest.toml here. ```

System details

Any relevant system information:

  • Julia version
  • operating system
  • modules loaded on cluster (module list)

Related issues / PRs

Please add any relevant links.

imreddyTeja avatar Oct 29 '24 21:10 imreddyTeja

It looks like the land.jl benchmark passed in this run yesterday, but the overall job did fail. Do you have a link to the land.jl run failing?

juliasloan25 avatar Oct 30 '24 23:10 juliasloan25

It looks like the land.jl benchmark passed in this run yesterday, but the overall job did fail. Do you have a link to the land.jl run failing? https://buildkite.com/clima/climaland-benchmark/builds/2141#0192e9ae-bbab-4eaf-9552-dc80558d3d89

imreddyTeja avatar Nov 06 '24 17:11 imreddyTeja

closing because lately the only failure comes from #665

juliasloan25 avatar Jul 08 '25 21:07 juliasloan25