5G-Federation copied to clipboard
Bonus based Exploration
Q(s, a) ← (1- α) Q(s, a) + α [r + γ ⋅ max a' f(Q(s', a'),N(s', a'))]
In this equation:
N(s′, a′) counts the number of times the action a′ was chosen in state s′.
f(Q, N) is an exploration function, such as f(Q, N) = Q + κ/(1 + N), where κ is a curiosity hyperparameter that measures how much the agent is attracted to the unknown.