Justin Yu

Results 23 issues of Justin Yu

In debug mode, saves to "softlearning/videos" instead of under the user's home directory "~/ray_results/...". Works fine using `run_example_local`.

## Why are these changes needed? ### The problem - When running *synchronous PBT* while checkpointing every time a perturbation happens, the experiment can reach a state where trial A...

bug
tune

## Why are these changes needed? This change is needed for the PBT algorithm to run correctly in the case where persistent checkpoints and in-memory checkpoints are both being saved....

tune

Skips a zoopt searcher test that's causing the `test_searchers` suite in CI to be flaky. Skipping as this is not a Tune issue and needs to be fixed in the...

- ddfeae3c86b997e7a0bf1391f37d2831a2da3542 FAILED [Buildkite :octopus: Tune tests and examples (small)](https://buildkite.com/ray-project/oss-ci-build-branch/builds/1195#01849a2c-4c71-4d3e-ae06-33f0c90a61e8) - 073e7bc04d989607848552537f9f5ac91fa07d85 FAILED [Buildkite :octopus: Tune tests and examples (small)](https://buildkite.com/ray-project/oss-ci-build-branch/builds/1194#01849a26-f367-434d-8c81-32cbe16ad16d) - df76ac7975334a5fec7affcc910076ca435fb772 FAILED [Buildkite :octopus: Tune tests and examples (small)](https://buildkite.com/ray-project/oss-ci-build-branch/builds/1152#01848b8b-8cf0-4522-a079-fb14d0b365ec)...

tune
stale
flaky-tracker

This PR de-duplicates the batch inference template by making it the same as the existing pytorch gpu batch inference example. There still needs to be a copy due to relative...

tests-ok

## Why are these changes needed? https://github.com/ray-project/ray/pull/30895 upgraded the pinned version of `modin`, removing support for python Closes https://github.com/ray-project/ray/issues/36299 ## Checks - [ ] I've signed off every commit(by using...

release-test

[Latest job link](https://console.anyscale-staging.com/o/anyscale-internal/jobs/prodjob_xstq7e5eqi59lzdx3u68xbprwu) This is flaky because we only run 2 trials, so most of the time the network architectures of both trials may be the same, so checkpoints can...

P1
tune
train
release-test

## Why are these changes needed? The `bohb_example` is super flaky after https://github.com/ray-project/ray/pull/35338 modified the example to run more trials. This is just a patch fix to deflake the example....

## Why are these changes needed? This PR fixes the `air_example_gptj_deepspeed_fine_tuning.gce` release test. It was failing due to our GCE nodes not having an AWS credentials file. This is not...

release-test