Alexander Nikulin

Results 30 comments of Alexander Nikulin

@vwxyzjn I'd appreciate if you could take a quick look at the code (without going into details) to check that I match the style of the rest of the library.

I also don't quite understand the decision to evaluate an agent by episodic reward and with stochastic actions. This is especially noticeable with --capture-video as it slows down the training...

> What is the problem and how is it related to capture-video Always capturing video from one of the envs during training has a noticeable overhead (3x slower on my...

Yup, I started to run some stuff. At first I will be testing on Swimmer-v3 as the results on it are very different from the other environments with PPO (for...

Well, the paper indicates only the enumeration sweep that they used, but not the final best ones. :disappointed:

Yes, it is it! The most important parameters are given here only as a grid for a sweep. It doesn't specify `num-envs`/`num-steps` also, but I suppose we could leave them...

@vwxyzjn So, what is the policy of submitting runs to the wandb? Should I first experiment on my local private project and then re-run final evaluation to the `openrlbenchmark/cleanrl`? Or...

First sanity-check on 3 seeds, seems like it is working as expected on Swimmer-v3. Even better than in paper, but they use more seeds. ![W B Chart 22 06 2022,...

It is a deep mystery to me why it works so well on this particular environment. Algorithms based on discounted reward can only [solve](https://github.com/thu-ml/tianshou/issues/401) it if you set `gamma=0.9999`, but...

@vwxyzjn Results for APO Gym Mujoco will be in this report: https://wandb.ai/openrlbenchmark/cleanrl/reports/-WIP-APO-on-Gym-Mujoco---VmlldzoyMjEwMjY4 Feel free to edit or suggest changes. Runtime ideally should be same as PPO (as there is no...