genrl issues

Environment Wrappers

Add documentation on the types of available env wrappers

documentation

good first issue

Customizability - Environments

More detailed explanations for Policy and Value Functions

@DarylRodrigo pointed out that the current explanations of Policy, Value functions are not good. One of our aims is to tackle Accessibility. Quoting him: "the intro into policy and value...

Sharad24

Accessibility - Tutorials

Bellman equations

More explanations of bellman equations and their relation with MDPs and then Q-Learning, etc. At the moment, we have almost nothing wrt bellman equations! Thanks @DarylRodrigo

Sharad24

Accessibility - Tutorials

Reproducibility

Current training is not reproducible. (Observation based on `deep.py` in `examples`). Something is messing up, not sure why seeding is not doing the job here. Could be that we're missing...

Sharad24

Priority:High

Core

Issue Tracker

We currently have a lot of issues, a lot of which are either incomplete, not relevant for the time being, need to be done at a later period or we...

sampreet-arthi

Evaluating performance of contextual bandit agents in examples

2

I have been playing around with the DCBTrainer and found some potential inconsistencies. 1) **StatlogData** example found [here](https://genrl.readthedocs.io/en/latest/usage/tutorials/bandit/contextual_overview.html) ``` from genrl.utils import StatlogDataBandit bandit = StatlogDataBandit(download=True) context = bandit.reset() from...

TMorville

Value Iteration - Docs

Sharad24

documentation

good first issue

Dm Env

Should we shift to using [DM Env](https://github.com/deepmind/dm_env)? Should probably evaluate this as a potential option.

Sharad24

Customizability - Environments

A2C and VPG do not train

3

Sharad24

bug

[WIP] Adding MultiAgent Utilities

10

AdityaKapoor74

genrl
genrl copied to clipboard

Metadata

Environment Wrappers

More detailed explanations for Policy and Value Functions

Bellman equations

Reproducibility

Issue Tracker

Evaluating performance of contextual bandit agents in examples

Value Iteration - Docs

Dm Env

A2C and VPG do not train

[WIP] Adding MultiAgent Utilities

← Metadata

Owner

Metadata

genrl genrl copied to clipboard

Metadata

← Metadata

Owner

Metadata

genrl
genrl copied to clipboard