Sergio Guadarrama

Results 68 comments of Sergio Guadarrama

What I meant is that one should use the same policy to collect the data that it uses for training. So mixing policies between algorithms it's not warranted to work.

I'm not sure what code are you running, and what do you mean by (using greedy policy in training, since that should only be used for eval). Are you using...

You can try overwriting the dtype of the `step_type` in the Policy given to the Driver.

Unfortunately the DynamicDriver has dynamic shapes and doesn't allow jit-compilation, you can compile the Network or the Policy though.

Take a look at this [examples](https://github.com/tensorflow/agents/blob/93c6b1b40e869e27f6bbaaa1d6cc8d24ff367cb9/tf_agents/policies/policy_saver.py#L91) ``` saved_policy = tf.compat.v2.saved_model.load('policy_0') time_step = ... while True: policy_step = saved_policy.action(time_step) time_step = f(policy_step.action) ```

Instead of modifying the encoding network we recommend using Nest_Map to create networks with multiple inputs. See example [here](https://github.com/tensorflow/agents/blob/76397a546c1f8bdea1d7690c878fb95e874751a8/tf_agents/networks/nest_map_test.py#L71) and [here](https://github.com/tensorflow/agents/blob/488e5399db40102dae256932f6c69343f6849128/tf_agents/examples/sac/haarnoja18/sac_train_eval.py#L81)

If you want to pass multiple inputs to the same Network (assuming the Network knows how to handle multiple inputs) what you need to do is nest the inputs appropiately....

If you update tensor2tensor to 1.13.1 it removed the tf-agents dependency until we fix the issue https://github.com/tensorflow/tensor2tensor/commit/a4071d62f510a3b0dace62f9fa78e2f9a60c5c40#diff-2eeaed663bd0d25b7e608891384b7298

Thanks for your contribution, I think it would be great to get this as a PR with a simple example using ParallelEnv. I would probably suggest renaming UnbatchingObserver to BatchedObserverUnbatching...

Can you make sure that the actors are generating the data that the learner needs? For instance can you get data by doing ``` next(learner._experience_iterator) ```