rlpyt issues

Multi-agent algorithm

4

I'm trying to use PPO_LSTM and R2D1 with a multi-agent environment. I was checking the other related [issue](https://github.com/astooke/rlpyt/issues/14) but it seems it was more about DDPG and not recurrent models....

kargarisaac

delayed update sampler_model get better result?

2

Hi, In async_rl, with small change i get better reuslt (3 times converge speed), ``` opt_info = self.algo.optimize_agent(itr, sampler_itr=self.ctrl.sampler_itr.value) if itr % 5 == 0: # added this line self.agent.send_shared_memory()...

sharr6

Exception in logger with Pytorch >= 1.4

1

In Pytorch >= 1.4, grad_norm is a torch tensor (changed in https://github.com/pytorch/pytorch/pull/32020) and not a float, so the logger throws an exception here (`values` is now a list of pytorch...

ankeshanand

Documentation now available (questions about documentation here)

16

Hi! Documentation is now available! https://rlpyt.readthedocs.io Feel free to post in this issue for minor clarifications / comments, or start a new issue if it's something bigger. Hope this helps!!

astooke

Set number of workers in ParallelSamplerBase without setting affinities manually

Hi Adam, When using a class that inherits from ParallelSamplerBase (e.g. CPUSampler), I set the number of workers by passing a list of cpu affinities into my runner in the...

bpiv400

Possible misconfiguration in target_update_interval

1

The default target_update_interval for DQN based algorithms is set as 312 and not changed for any of the variants in the configs (except for R2D1, which seems to be correctly...

ankeshanand

Understanding batch_T

6

Hi, `batch_T (int) – number of time-steps per sample batch` I don't understand the effect of `batch_T` in samplers. I see another `batch_T` in R2D1 too. So what is the...

kargarisaac

How to adjust epsilon (in epsilon greedy) on a per-episode basis, in parallel?

7

Hi Adam, Thanks again for getting rlpyt set up. I am wondering if it is possible to do this when running RL in parallel: within each parallel environment, at the...

DanielTakeshi

Normalizing environment wrapper

4

For Mujoco envs, i's a standard practice to normalize rewards by a running estimate of their standard deviation (e.g. VecNormalize in baselines, NormalizedEnv in rllab). Without it, performance is noticeably...

vzhuang

Distributional DDPG Progress?

4

What is the status of Distributional DDPG? In the Readme, it says "coming soon".

AlexanderKoch-Koch

rlpyt
rlpyt copied to clipboard

Metadata

Multi-agent algorithm

delayed update sampler_model get better result?

Exception in logger with Pytorch >= 1.4

Documentation now available (questions about documentation here)

Set number of workers in ParallelSamplerBase without setting affinities manually

Possible misconfiguration in target_update_interval

Understanding batch_T

How to adjust epsilon (in epsilon greedy) on a per-episode basis, in parallel?

Normalizing environment wrapper

Distributional DDPG Progress?

← Metadata

Owner

Metadata

rlpyt rlpyt copied to clipboard

Metadata

← Metadata

Owner

Metadata

rlpyt
rlpyt copied to clipboard