ray Functionality for a custom async evaluation function with the new EnvRunner API

Functionality for a custom async evaluation function with the new EnvRunner API

Open simonsays1980 opened this issue 1 year ago • 0 comments

Custom evaluation functions are a valuable feature in the old stack. With the new stack, more specifically the new EnvRunner API evaluation is always run asynchronously, but does not (yet) allow a user to pass in a custom evaluation function for asynchronous evaluation of policies (RLModules). This PR closes this gap by proposing a custom async evaluation function. This comes along with validations, if the function can be used at all (only with the new 'EnvRunner' API) and certain requirements for the function are fullfilled:

It's indeed a function.
Four arguments are defined (for algorithm, eval_workers, weights_ref, weights_seq_no.
The function uses indeed asynchronous evaluation, i.e. it uses eval_workers.foreach_worker_async.

Why are these changes needed?

Custom evaluation plays in many use cases a role: For example, often a user wants to evaluate a policy not only to its own historic performance, but against a benchmark the policy competes against.

Related issue number

Checks

[x] I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
[x] I've run scripts/format.sh to lint the changes in this PR.
[x] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in doc/source/tune/api/ under the corresponding .rst file.
[x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- [x] Unit tests
- [x] Release tests
- [ ] This PR is not tested :(

Feb 10 '24 13:02 simonsays1980

ray ray copied to clipboard

Functionality for a custom async evaluation function with the new EnvRunner API

Why are these changes needed?

Related issue number

Checks

ray
ray copied to clipboard