Michael Panchenko comments

Results 187 comments of


                                            Michael Panchenko

"test_reward" KeyError in epoch_stat["test_reward"] when saving checkpint through trainer information

This will be solved as part of #933. Typed returns with annotations will make it easier to trace down the error

Remove symlinks in repo

The symlinks are there for executing examples inside tests. I have the feeling that this mechanism should generally be adjusted, after which this issue may become obsolete. I'm putting it...

Support episodes of different lengths and running times

Episodes with varying lengths are an important feature, especially since gymnasium `step` can return `truncated=True`. There is no reason at all to expect this to be thrown at the same...

Support episodes of different lengths and running times

@ivanappliedai @xuzzo

[Questions] against PPO process_fn implementation: why not re-using forward's log_prob but re-compute instead?

There is a significant difference, it is a naming issue. For continuous envs, the output of forward is not a log-prob but the inputs of a distribution, typically mean and...

Refactored pg.py logits variable name.

> REDQPolicy Class: Also here loc_scale maybe should be renamed? @MischaPanch > > https://github.com/thu-ml/tianshou/blob/4756ee80ff11cd8692aef3752f35c0af60a452e8/tianshou/policy/modelfree/redq.py#L147-L166 > What's wrong with the loc_scale? > So basically for all actors and all policies that...

Michael Panchenko

"test_reward" KeyError in epoch_stat["test_reward"] when saving checkpint through trainer information

Remove symlinks in repo

Support episodes of different lengths and running times

Support episodes of different lengths and running times

[Questions] against PPO process_fn implementation: why not re-using forward's log_prob but re-compute instead?

Refactored pg.py logits variable name.

Refactored pg.py logits variable name.

Refactored pg.py logits variable name.

Create high level interfaces for config and experiments

Create high level interfaces for config and experiments