ray icon indicating copy to clipboard operation
ray copied to clipboard

[RLlib] RNNs and RLModules

Open ArturNiederfahrenhorst opened this issue 2 years ago • 3 comments

Why are these changes needed?

This is a fork of https://github.com/ray-project/ray/pull/30997 to experiment with how we incorporate RNNs into RLModules.

ArturNiederfahrenhorst avatar Feb 21 '23 23:02 ArturNiederfahrenhorst

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

  • If you'd like to keep this open, just leave any comment, and the stale label will be removed.

stale[bot] avatar Apr 02 '23 02:04 stale[bot]

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you'd still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for opening the issue!

stale[bot] avatar Apr 26 '23 01:04 stale[bot]

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you'd still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for opening the issue!

stale[bot] avatar May 12 '23 12:05 stale[bot]

I've included an example that prooves learning with a custom tokenizer. This custom tokenizer makes it so that the model is the same as in our custom_rnn example from the old stack. The following two curves are both tested environments (orange is old, other is new stack).

Screenshot 2023-06-21 at 16 50 33

ArturNiederfahrenhorst avatar Jun 21 '23 23:06 ArturNiederfahrenhorst

So there are two quality issues with this PR which I am intentionally postponing to another follow-up / clean up PR after the branch cut for 2.6:

  1. Having the auto_fold_unfold_time decorator depend on the spec checking module is not the cleanest solution. Also there is a lot of reduntant code between tf / torch which would make maintenance a huge pain. @ArturNiederfahrenhorst We should come back and fix this issue
  2. The overriding of v and r values in compute_advantages does not sound right. In my opinion the caller should write the correct v and r inside the SampleBatch already. We should investigate that further.

kouroshHakha avatar Jun 27 '23 19:06 kouroshHakha