tianshou V-trace support?

[x] I have marked all applicable categories:
- [ ] exception-raising bug
- [ ] RL algorithm bug
- [ ] documentation request (i.e. "X is missing from the documentation.")
- [x] new feature request
[x] I have visited the source website, and in particular read the known issues
[x] I have searched through the issue tracker for duplicates
[ ] I have mentioned version numbers, operating system and environment, where applicable:
```
import tianshou, sys
print(tianshou.__version__, sys.version, sys.platform)
```

Apr 04 '20 11:04 szrlee

I suggest that you refer to this paper: IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures. I do have a plan add the algorithm to this platform. And I've already made it worked, but the code is not compatible with the current platform. It'll still take some time to adjust.

Apr 05 '20 00:04 fengredrum

I suggest that you refer to this paper: IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures. I do have a plan add the algorithm to this platform. And I've already made it worked, but the code is not compatible with the current platform. It'll still take some time to adjust.

Well, thanks @fengredrum Actually, I am trying to make our algorithms "DAPO" in this paper originally implemented in our "memorie" distributed framework to be compatible with Tianshou. DAPO relies on the behaviour policy's probability $\pi_{old}(a_t|s_t)$ (necessary), multi-step bootstrapping (necessary) and V-trace (not necessary but it contains previous two features). Therefore, if V-trace is supported, it is easier for me to reimplement DAPO. Last but not least, DAPO has been proved to have superior performance than IMPALA (which is (one-step) entropy augmentation in DAPO paper) through empirical study.

Apr 05 '20 07:04 szrlee

Closing as stale (and lacking description)

Oct 14 '23 15:10 MischaPanch

tianshou tianshou copied to clipboard

V-trace support?

tianshou
tianshou copied to clipboard