Costa Huang

Results 96 issues of Costa Huang

## Description ## Types of changes - [ ] Bug fix - [x] New feature - [ ] New algorithm - [ ] Documentation ## Checklist: - [x] I've read...

## Problem Description The current PPO implementations can be improved in the following way. ### changes that do not involve performance change - [ ] #207 - [x] #203 -...

## Problem Description While CleanRL is standalone and licensed under MIT, it does depend on more copyleft dependencies like GPL. See #201 or https://app.fossa.com/reports/fdbf4b3e-435b-4db2-bc6b-57296284358f Since we are not modifying these...

## Problem Description A lot of the formatting changes are suggested by @Howuhh ### 1. Refactor on `next_done` The current code to handle `done` looks like this ```python next_obs, reward,...

See #218. @yooceii and @kinalmehta have expressed interest in working on this. @kinalmehta is also interested in C51 (#221). @kinalmehta if working with DQN first helps with working on C51,...

@kinalmehta has expressed interest in working on this.

## Problem Description Given the incredible performance of the DDPG + JAX prototype (#187), it's worth prototyping TD3 + JAX as well. @joaogui1 is super experienced with JAX and has...

# Problem description Per [Andrychowicz, et al. (2021)](https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/#Andrychowicz) and anecdotal evidence, value function clipping is not useful. Hence we should remove the following code. https://github.com/vwxyzjn/cleanrl/blob/94a685de9290435623d7cf5e4e770418ddb10a4f/cleanrl/ppo.py#L283-L291 We should do it with...

# Problem description. The regular advantage calculation in PPO is a special case of the GAE advantage calculation when `gae_lambda=1` - we empirically demonstrate this with the debugging output in...