tianshou issues

speed testing in case of parallel/distributed computation

5

- [ x] I have marked all applicable categories: + [ ] exception-raising bug + [ ] RL algorithm bug + [ ] documentation request (i.e. "X is missing from...

AlessandroZavoli

question

fp16 training

- [ ] I have marked all applicable categories: + [ ] exception-raising bug + [ ] RL algorithm bug + [ ] documentation request (i.e. "X is missing from...

Trinkle23897

enhancement

Please consider Hydra

2

Hi @thu-ml. tianshou looks awesome. I am the author of [Hydra](https://hydra.cc). I think you should definitely check it out. It can probably make your life much easier when dealing with...

omry

enhancement

V-trace support?

2

- [x] I have marked all applicable categories: + [ ] exception-raising bug + [ ] RL algorithm bug + [ ] documentation request (i.e. "X is missing from the...

szrlee

enhancement

Can running and training be separated?

2

Can running and training be separated? For example, we deploy on the cloud, send data to the cloud for training, and issue policies to local hosts intermittently or in real...

kikikio

enhancement

Tests are failing for `RayEnvWorker`

- [x] I have marked all applicable categories: + [x] exception-raising bug + [ ] RL algorithm bug + [ ] documentation request (i.e. "X is missing from the documentation.")...

Markus28

## RL algorithm bug Ratio probability of the action should have been saved for original weights taking this action, it currently re-computes the probability with updated weights which is incorrect....

ikamensh

question

In PPOPolicy, the ratio is computed with requires_grad `True`.

4

The bug will cause gradient exploding when add action mask in the dist_fn.

imerme

bug

I trained my env using "tianshou/test/continuous/test_redq.py", but there have some bugs

15

“TypeError: only integer tensors of a single element can be converted to an index” is on line 181 of “test_ppo.py”. End then，I try change ` for epoch, epoch_stat, info in...

zhen3072

question

Implement advice to call gpu reward function using SubprocVectorEnv

3

- [x] I have marked all applicable categories: + [ ] exception-raising bug + [ ] RL algorithm bug + [ ] documentation request (i.e. "X is missing from the...

hedy14

question

tianshou
tianshou copied to clipboard

Metadata

speed testing in case of parallel/distributed computation

fp16 training

Please consider Hydra

V-trace support?

Can running and training be separated?

Tests are failing for `RayEnvWorker`

PPO: wrong old log prob

In PPOPolicy, the ratio is computed with requires_grad `True`.

I trained my env using "tianshou/test/continuous/test_redq.py", but there have some bugs

Implement advice to call gpu reward function using SubprocVectorEnv

← Metadata

Owner

Metadata

tianshou tianshou copied to clipboard

Metadata

← Metadata

Owner

Metadata

tianshou
tianshou copied to clipboard