Bahador Bakhshi
Bahador Bakhshi
In addition to the random policy, it is better to use random states in the initial exploration phase. But how to implement it?
"Table 10-1. Typical regression MLP architecture" in the "Hands on ..." provides a good summary of typical settings for MLP
In the DeepMind paper, they stated that they also explore (using the e-greedy approach with very small e) to avoid overfitting
n-step Tree Backup approach can also be used
In overcharging, if a resource has sufficient capacity, it should not be overcharged!!!
In general, initializing the Q with large values (being more optimistic) can help for better exploration.
This paper proposed/reviewed different approaches to approximate the rho in the R-Learning "Average-reward model-free reinforcement learning"
In most practical cases, the service is instantiated and then used for a long time. The revenue is computed according to period of the usage!!! How can it be modeled?...
These are configured for provider domain 1- Quota 2- Overcharge scale 3- Overcharge quota scale demand < quota --> the normal cost quota > demand > quota * quota scale...