agents
agents copied to clipboard
TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.
Hi, the wrap_env function allow to mix a time limit with auto_reset disabled, but TimeLimit reset environment without checking this flag. Note: it seems that TimeLimit implementation mismatch documentation, as...
We want to implement RL on android device. Just wondering if it is possible to run tf-agents on android or to convert tf-agents to tf-lite. It will be great if...
Hello,I found a performance issue in `tf_agents/utils/example_encoding_dataset.py` , [dataset = dataset.map(decode_fn)](https://github.com/tensorflow/agents/blob/b4505ed5021f66c6b7f43b7a082eb5ae8fe41af7/tf_agents/utils/example_encoding_dataset.py#L245) was called without **num_parallel_calls**. I think it will increase the efficiency of your program if you add this. The...
I couldn't find any references in the documentation regarding the support for learning under delayed feedback (https://sites.ualberta.ca/~szepesva/papers/DelayedOnlineLearning.pdf). For example, in a simple batch-oriented usecase with multi-armed bandits, is there a...
Hi team, thank you for this great package! I have a question related to the value of StepType when we have a relatively short trajectory. So based on my understanding,...
The function `sample_spec_nest` currently raises a TypeError if any of the specs has dtype bool. For example, the below code: ``` import tensorflow as tf from tf_agents.specs import tensor_spec spec...
I'm trying to use my own custom OpenAI gym with tf-agents. So I load it with suite_gym from tf-agent: `` env = suite_gym.load(env_name, max_episode_steps=max_episode_steps) train_env = tf_py_environment.TFPyEnvironment(env) `` But my...
This PR is to address the issue #620, and the implementation includes: - [x] Tabular agents using Dynamic Programming and Temporal Difference - [x] Unit Testing for Tabular agents -...
Hello, I'm trying to implement the PPO agent using a custom environment with a Discrete spaces object with bounds [0,4), but the agent policy is choosing a number out of...
It seems to be impossible to use the ActorRNNNetwork with stacked LSTM layers. I noticed this using a custom environment but was able to reproduce the problem with an official...