ray
ray copied to clipboard
[RLlib] RNNs and RLModules
Why are these changes needed?
This is a fork of https://github.com/ray-project/ray/pull/30997 to experiment with how we incorporate RNNs into RLModules.
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
- If you'd like to keep this open, just leave any comment, and the stale label will be removed.
Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.
Please feel free to reopen or open a new issue if you'd still like it to be addressed.
Again, you can always ask for help on our discussion forum or Ray's public slack channel.
Thanks again for opening the issue!
Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.
Please feel free to reopen or open a new issue if you'd still like it to be addressed.
Again, you can always ask for help on our discussion forum or Ray's public slack channel.
Thanks again for opening the issue!
I've included an example that prooves learning with a custom tokenizer. This custom tokenizer makes it so that the model is the same as in our custom_rnn example from the old stack. The following two curves are both tested environments (orange is old, other is new stack).
So there are two quality issues with this PR which I am intentionally postponing to another follow-up / clean up PR after the branch cut for 2.6:
- Having the
auto_fold_unfold_time
decorator depend on the spec checking module is not the cleanest solution. Also there is a lot of reduntant code between tf / torch which would make maintenance a huge pain. @ArturNiederfahrenhorst We should come back and fix this issue - The overriding of v and r values in compute_advantages does not sound right. In my opinion the caller should write the correct v and r inside the SampleBatch already. We should investigate that further.