agents icon indicating copy to clipboard operation
agents copied to clipboard

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

Results 174 agents issues
Sort by recently updated
recently updated
newest added

I am trying to adapt the SAC minitaur tutorial which uses the Actor-Learner API and reverb to work with the PPO agent. I changed the `tf_agent `from `sac_agent.SacAgent` to the...

Hey everyone! I've been recently playing with Linear Bandits and during one of my experiments I have found out that neither [LinTS](https://github.com/tensorflow/agents/blob/master/tf_agents/bandits/agents/linear_thompson_sampling_agent.py) nor [LinUCB](https://github.com/tensorflow/agents/blob/master/tf_agents/bandits/agents/lin_ucb_agent.py) agents work with `tikhonov_weight=0`. While there...

bandits

I'm trying to use a DDPG agent with actor and critic networks, and a TFUniform replay buffer, training on my custom environment. I've extracted a training experience from the buffer...

Training the agent often fails with message "Loss is inf or nan". I found another thread where missing normalization was the culprit. I don't know what that is about, I...

I see that existing CategoricalProjectionNetwork supports only with the same number of actions along all dimensions. So, for example, discrete actions: `[3, 3, 3, 3]` -- good. Discrete actions: `[3,...

I have been trying to implement a PPO Agent that solves LunarLander-v2 as in the official example in the github repo: https://github.com/tensorflow/agents/blob/master/tf_agents/agents/ppo/examples/v2/train_eval_clip_agent.py In this example, a PPOClip agent is used....

Browsing through freely availabel sources I find both statements: DQN is good / is not good for stochastic environments. As far as I understand it, the Q-Network predicts the expected...

Hi Team, I'm trying to run the Actor-Learner API for Distributed Collection and Training as exaplained here: https://github.com/tensorflow/agents/tree/master/tf_agents/experimental/distributed/examples/sac but on multipe machines. Based on Reverb docs, let's say I have...

This issue is to track the feature request of adding Tabular agents into TF-Agent: - [ ] Tabular agents using Dynamic Programming and Temporal Difference - [ ] Unit Testing...

I am trying to use an action mask with a DQNAgent and, for the most part, have succeeded. But I have a persistent warning that I can't get rid of.....