ray
ray copied to clipboard
[RLlib; Offline RL] - Validate episodes before adding them to the buffer.
Why are these changes needed?
At the moment the OfflinePreLearner
samples recorded episodes or SampleBatch
es from a ray.data
dataset and then adds them to a buffer which corrdinates the time step sampling. It could potentially happen that
- The sampled episodes are duplicates (either in the batch or in the buffer)
- The sampled episodes are not
terminated
nortruncated
and therefore could and certainly will be fragmented in time order (i.e. we maybe sample first an episode chunk that contains timesteps 11 to 21 before we sample 0 to 11).
In both cases the buffer would raise an error as soon as SingleAgentEpisode.concat
is called.
This PR introduces a _validate_episodes
method to the OfflinePreLearner
to check episodes for duplicates and fragments and returns only unique episodes that are not in the buffer, yet. It disallows uncompleted episodes and thereby ensures that no fragments are added. Users are responsible to record only full episodes.
Related issue number
Checks
- [x] I've signed off every commit(by using the -s flag, i.e.,
git commit -s
) in this PR. - [x] I've run
scripts/format.sh
to lint the changes in this PR. - [x] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I added a
method in Tune, I've added it in
doc/source/tune/api/
under the corresponding.rst
file.
- [ ] I've added any new APIs to the API Reference. For example, if I added a
method in Tune, I've added it in
- [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
- [x] Unit tests
- [ ] Release tests
- [ ] This PR is not tested :(