Danijar Hafner comments

Results 165 comments of


                                            Danijar Hafner

Training threads don't start on Windows

I've seen it working on many people's computers :) Please check if YAML is installed: ```shell python3 -c "import ruamel.yaml; print('success')" ``` And check if the Pendulum environment works: ```shell...

Training threads don't start on Windows

Yea, that might be the problem. Processing is quite different between Windows and Linux/Mac and we mainly tested on the latter. I'm afraid I can't be of much help since...

Training threads don't start on Windows

@donamin Where you able to narrow down this issue?

Training threads don't start on Windows

Thanks for getting back. I'll keep this issue open for now. We might support Windows in the future since as far as I can see the threading is the only...

Training threads don't start on Windows

@erwincoumans Yes, this seems trivial since `self._worker()` does not access any object state. You'd just have to replace the occurrences of `self` with `ExternalProcess`. I'd be happy to accept a...

GPU doesn't seem to work

Hi @fengredrum. In case this is still an issue, could you try wrapping your network implementation in a `with tf.device('/gpu:0')` block?

GPU doesn't seem to work

Thanks for providing more details. I don't think the replay buffer should be placed on GPU, since it can grow quite large, especially when training from pixel observations. All ops...

GPU doesn't seem to work

@colinskow Could you try running without environment processes (`--noenv_processes`), please? When there is a crash in one of the processes it can cause the program to deadlock before anything is...

Distributed training with Kubernetes

This is an interesting topic. I would imagine that a simple distribution pattern would be to run multiple instances of the current code on multiple machine, each simulating their own...

Distributed training with Kubernetes

I think in many scenarios it makes sense to simulate and train on the same machine, and just scale the number of those machines. That's mainly because it seems to...