ray icon indicating copy to clipboard operation
ray copied to clipboard

[Train/CI] Fix flaky `test_reserved_cpu_warnings`

Open Yard1 opened this issue 2 years ago • 0 comments

Signed-off-by: Antoni Baum [email protected]

Why are these changes needed?

The issue seems to have been caused by Ray tasks / actors being sometimes kept alive between fit calls before garbage collection kicks in to kill them. This caused the ray.available_resources() call in TunerInternal._maybe_warn_resource_contention to return less CPUs available than expected by the test. This has been fixed by explicitly calling gc.collect() between fit calls.

Related issue number

Closes https://github.com/ray-project/ray/issues/31334

Checks

  • [ ] I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • [ ] I've run scripts/format.sh to lint the changes in this PR.
  • [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
  • [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • [ ] Unit tests
    • [ ] Release tests
    • [ ] This PR is not tested :(

Yard1 avatar Jan 03 '23 21:01 Yard1