Danijar Hafner
Danijar Hafner
@miyosuda That video seems like the agent memorized the environment. I think the paper authors use random starts to create a fairer evaluation. They sample a number from 0 to...
I see, so the agent just learned a good behavior that results in very repetitive episodes.
Doesn't subtracting the mean from the advantages have the effect of an entropy regularizer? Ignoring the clipping, the objective is `logp * (adv - mean) / std = logp *...
What's the reason for dumping the buffer to a text file before searching it? I'd be happy to help make this faster. The plugin is a great idea but right...
Would it be possible to bind the keys (maybe different ones to avoid conflicts) once during startup rather than every time copy mode is entered?
I see how it makes sense for code cells that Ctrl+Return runs the cell and stays in Vim command mode. But for markdown cells, the command doesn't do anything right...
Still occurs sometimes: `(sqlite3.OperationalError) database is locked`
Please email me with requests for the flights dataset. I'm not sure if we're allowed to publish the pre-processed dataset.
Thanks for the detailed comment! I agree that averaging is nicer than taking the maximum, but at this point it's more important to be compatible with the vast existing literature...
Hi @JesseFarebro, do you have an idea for a workaround here? It's the only issue holding me back from switching to `ale-py` and the new V5 envs. To implement max...