5G-Federation icon indicating copy to clipboard operation
5G-Federation copied to clipboard

Bonus based Exploration

Open Bahador-Bakhshi opened this issue 4 years ago • 0 comments

Q(s, a) ← (1- α) Q(s, a) + α [r + γ ⋅ max a' f(Q(s', a'),N(s', a'))]

In this equation:

  • N(s′, a′) counts the number of times the action a′ was chosen in state s′.

  • f(Q, N) is an exploration function, such as f(Q, N) = Q + κ/(1 + N), where κ is a curiosity hyperparameter that measures how much the agent is attracted to the unknown.

Bahador-Bakhshi avatar Nov 24 '20 13:11 Bahador-Bakhshi