agents icon indicating copy to clipboard operation
agents copied to clipboard

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

Results 174 agents issues
Sort by recently updated
recently updated
newest added

Hi, the wrap_env function allow to mix a time limit with auto_reset disabled, but TimeLimit reset environment without checking this flag. Note: it seems that TimeLimit implementation mismatch documentation, as...

We want to implement RL on android device. Just wondering if it is possible to run tf-agents on android or to convert tf-agents to tf-lite. It will be great if...

Hello,I found a performance issue in `tf_agents/utils/example_encoding_dataset.py` , [dataset = dataset.map(decode_fn)](https://github.com/tensorflow/agents/blob/b4505ed5021f66c6b7f43b7a082eb5ae8fe41af7/tf_agents/utils/example_encoding_dataset.py#L245) was called without **num_parallel_calls**. I think it will increase the efficiency of your program if you add this. The...

I couldn't find any references in the documentation regarding the support for learning under delayed feedback (https://sites.ualberta.ca/~szepesva/papers/DelayedOnlineLearning.pdf). For example, in a simple batch-oriented usecase with multi-armed bandits, is there a...

type:feature request
bandits

Hi team, thank you for this great package! I have a question related to the value of StepType when we have a relatively short trajectory. So based on my understanding,...

The function `sample_spec_nest` currently raises a TypeError if any of the specs has dtype bool. For example, the below code: ``` import tensorflow as tf from tf_agents.specs import tensor_spec spec...

I'm trying to use my own custom OpenAI gym with tf-agents. So I load it with suite_gym from tf-agent: `` env = suite_gym.load(env_name, max_episode_steps=max_episode_steps) train_env = tf_py_environment.TFPyEnvironment(env) `` But my...

This PR is to address the issue #620, and the implementation includes: - [x] Tabular agents using Dynamic Programming and Temporal Difference - [x] Unit Testing for Tabular agents -...

cla: yes

Hello, I'm trying to implement the PPO agent using a custom environment with a Discrete spaces object with bounds [0,4), but the agent policy is choosing a number out of...

It seems to be impossible to use the ActorRNNNetwork with stacked LSTM layers. I noticed this using a custom environment but was able to reproduce the problem with an official...