Description

Added implementation of TQC algorithm. I've tried to stay as close to the original PyTorch implementation as possible.

Implementation is based on the implementation of SAC in the examples (here).
The main differences to SAC are in the utils.py file, where the critic and actor architectures and the TQC loss are implemented.
Added a TQCLoss class, which is a subclass of LossModule. Currently this class is implemented in the utils.py file, but this could probably be enhanced a little bit and then added to the objectives directory of torchrl, where for instance the SACLoss classes etc. are implemented. Please feel free to comment on whether you think this should be done and what changes to the current implementation of my TQCLoss implementation you would recommend.
I have tested the code on the HalfCheetah-v4 environment and it seems to perform well (see screenshot below). I am happy to conduct more extensive tests (such as several runs with different seed values to assess robustness, training on different environments) if you think that would be a helpful check.

eval reward

training losses

Motivation and Context

This issue closes #1623

[x] I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

[x] Example (update in the folder of examples)

Checklist

Go over all the following points, and put an x in all the boxes that apply. If you are unsure about any of these, don't hesitate to ask. We are here to help!

[x] I have read the CONTRIBUTION guide (required)
[x] My change requires a change to the documentation.*
[ ] I have updated the tests accordingly (required for a bug fix or a new feature).
[ ] I have updated the documentation accordingly.

Might be a good idea to add a few test runs of TQC to the EXAMPLES.md file.

Oct 17 '23 11:10 maxweissenbacher

Hi @maxweissenbacher!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

Oct 17 '23 11:10 facebook-github-bot

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

Oct 17 '23 12:10 facebook-github-bot

Thank you Vincent! Will get to work on those changes now. I think adding the TQCLoss to the objectives would be a great idea. If you could assist in writing tests, that would be great.

Oct 19 '23 14:10 maxweissenbacher

Just to keep you in the loop: I'm working on a separate branch that integrates the loss in objectives, I'll keep you posted and probably ask for your help :)

Oct 25 '23 15:10 vmoens

Amazing! Definitely do let me know if I can help. Would love to hear your opinion on the issue with the value estimators (see the comment and the latest commit).

Oct 25 '23 15:10 maxweissenbacher

@maxweissenbacher I made a PR against your main branch with proposed changes https://github.com/maxweissenbacher/rl/pull/1 (nit: in the future, it's best to branch out before making a PR, working from your main branch can cause some hustle for other developers working with you :) )

Nov 09 '23 14:11 vmoens

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/1631

:page_facing_up: Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

:white_check_mark: You can merge normally! (18 Unrelated Failures)

As of commit 2e56b5b41d76b884d3cc493b6e96da3af98154c8 with merge base c7d4764e787e4be903f7b5f03b6008f00e9b23a1 ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Nov 10 '23 15:11 pytorch-bot[bot]

Hi @maxweissenbacher Would you like us to take it from there? I can find some time to make this mergeable, just wanted to check with you first if you still wanted to work on it?

Nov 27 '23 11:11 vmoens

Hi Vincent, my apologies for leaving this dormant for so long, I have been busy with moving house. I will get to work on it this week!

Nov 27 '23 11:11 maxweissenbacher

Hi Vincent, my apologies for leaving this dormant for so long, I have been busy with moving house. I will get to work on it this week!

No worry at all! I was just checking in to see if there was anything blocking you really :)

Nov 27 '23 11:11 vmoens

rl
rl copied to clipboard

[Algorithm] Added TQC

Description

Motivation and Context

Types of changes

Checklist

Action Required

Process

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/1631

:white_check_mark: You can merge normally! (18 Unrelated Failures)

rl rl copied to clipboard

[Algorithm] Added TQC

Description

Motivation and Context

Types of changes

Checklist

Action Required

Process

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/1631

:white_check_mark: You can merge normally! (18 Unrelated Failures)

rl
rl copied to clipboard