Sampreet comments

Results 36 comments of


                                            Sampreet

Usage explanatory docs

At the end, add a `trainer.evaluate()`. That will make sure that the greedy policy is followed each time. Should give 500.

Usage explanatory docs

What `trainer.evaluate() ` does is it makes sure that whenever an action is selected, the deterministic policy is followed (see the VPG implementation). Not sure specifically about VPG but it's...

Usage explanatory docs

Yes it should. But the stochasticity maybe remains the same. That way the agent maybe has already learned the optimal policy but will still continue to explore in the same...

Usage explanatory docs

> When we use a cnn for atari envs, we get a feature vector using that cnn on the state representation and then use mlp accordingly on that feature vector...

Prioritized Replay Buffer Support for Off Policy Agents

This is up for grabs!

HER Wrappers

Are you done here? @hades-rp2010 If you can resolve the merge conflicts and maybe the codeclimate issues then we can merge this.

Updating docstrings

Not sure. I checked a couple of files here and there and they weren't. Feel free to remove them from the list if they're already done. There's a lot of...

Updating docstrings

It's just `batch_size` rn. Don't look at the older docstrings. Just look at the function init variables

Logger Formatting

Sure!

Logger Formatting

Tbh, shifting to scientific notation doesn't sound like a good idea. For the simple reason that it doesn't look good. The current logger takes care of the fixed length problem...