Benchmark on buildkite is unpredictable
Describe the bug
- at one point the land.jl benchmark was failing with no error or test failure message
- it seems like it was hitting the time limit of the slurm job because it was getting caught in CUDA profiling.
- This was also an issue with the Richards benchmark
- After trying again a few days later, benchmarking the same commits passed
- As of Nov 4th, the benchmarks mostly worked, but sometimes one will still get hung up CUDA profiling
To Reproduce
Run a benchmark by adding the "Run Benchmarks Tag"
Or even better, add it as a unit test, and open pull request.
Project
If not using the `examples` project: ``` paste your Project.toml here. ``` ``` paste your Manifest.toml here. ```
System details
Any relevant system information:
- Julia version
- operating system
- modules loaded on cluster (
module list)
Related issues / PRs
Please add any relevant links.
It looks like the land.jl benchmark passed in this run yesterday, but the overall job did fail. Do you have a link to the land.jl run failing?
It looks like the land.jl benchmark passed in this run yesterday, but the overall job did fail. Do you have a link to the land.jl run failing? https://buildkite.com/clima/climaland-benchmark/builds/2141#0192e9ae-bbab-4eaf-9552-dc80558d3d89
closing because lately the only failure comes from #665