rlpyt icon indicating copy to clipboard operation
rlpyt copied to clipboard

Reinforcement Learning in PyTorch

Results 64 rlpyt issues
Sort by recently updated
recently updated
newest added

I'm trying to use PPO_LSTM and R2D1 with a multi-agent environment. I was checking the other related [issue](https://github.com/astooke/rlpyt/issues/14) but it seems it was more about DDPG and not recurrent models....

Hi, In async_rl, with small change i get better reuslt (3 times converge speed), ``` opt_info = self.algo.optimize_agent(itr, sampler_itr=self.ctrl.sampler_itr.value) if itr % 5 == 0: # added this line self.agent.send_shared_memory()...

In Pytorch >= 1.4, grad_norm is a torch tensor (changed in https://github.com/pytorch/pytorch/pull/32020) and not a float, so the logger throws an exception here (`values` is now a list of pytorch...

Hi! Documentation is now available! https://rlpyt.readthedocs.io Feel free to post in this issue for minor clarifications / comments, or start a new issue if it's something bigger. Hope this helps!!

Hi Adam, When using a class that inherits from ParallelSamplerBase (e.g. CPUSampler), I set the number of workers by passing a list of cpu affinities into my runner in the...

The default target_update_interval for DQN based algorithms is set as 312 and not changed for any of the variants in the configs (except for R2D1, which seems to be correctly...

Hi, `batch_T (int) – number of time-steps per sample batch` I don't understand the effect of `batch_T` in samplers. I see another `batch_T` in R2D1 too. So what is the...

Hi Adam, Thanks again for getting rlpyt set up. I am wondering if it is possible to do this when running RL in parallel: within each parallel environment, at the...

For Mujoco envs, i's a standard practice to normalize rewards by a running estimate of their standard deviation (e.g. VecNormalize in baselines, NormalizedEnv in rllab). Without it, performance is noticeably...

What is the status of Distributional DDPG? In the Readme, it says "coming soon".