Bahador Bakhshi issues

Results 22 issues of


                                            Bahador Bakhshi

Random State in Initial Exploration

In addition to the random policy, it is better to use random states in the initial exploration phase. But how to implement it?

DQL Parameter Tuning

"Table 10-1. Typical regression MLP architecture" in the "Hands on ..." provides a good summary of typical settings for MLP

Exploration in Running time

In the DeepMind paper, they stated that they also explore (using the e-greedy approach with very small e) to avoid overfitting

n step TD

n-step Tree Backup approach can also be used

Per-Resource Overcharging

In overcharging, if a resource has sufficient capacity, it should not be overcharged!!!

Optimistic Initialization of Q values

In general, initializing the Q with large values (being more optimistic) can help for better exploration.

rho approximation in R-Learning

This paper proposed/reviewed different approaches to approximate the rho in the R-Learning "Average-reward model-free reinforcement learning"

Lifetime proportional revenue

In most practical cases, the service is instantiated and then used for a long time. The revenue is computed according to period of the usage!!! How can it be modeled?...

Rejection + Overcharging

These are configured for provider domain 1- Quota 2- Overcharge scale 3- Overcharge quota scale demand < quota --> the normal cost quota > demand > quota * quota scale...