Bahador Bakhshi

Results 22 issues of Bahador Bakhshi

Different approaches can be used for decaying the alpha and epsilon, for example alpha = alpha0 / (1 + iteration * decay)

If the consumer domain can select the overcharging, maybe in sometimes, it prefer to overcharge to keep resources for other demands

Q(s, a) ← (1- α) Q(s, a) + α [r + γ ⋅ max a' f(Q(s', a'),N(s', a'))] In this equation: - N(s′, a′) counts the number of times the...

Instead of MDP and Q/R Learning, Contextual bandit may be also applicable

Dyna needs more less episodes to converge It seems that, in large problems, it is really beneficial to use it instead of direct Q-Learning

Double learning can be applied for QL, SARSA and Expected SARSA

Does the "Expected SARSA" do better than QL?

Does SARSA do better than QL?