Bahador Bakhshi

Results 22 issues of Bahador Bakhshi

In addition to the random policy, it is better to use random states in the initial exploration phase. But how to implement it?

"Table 10-1. Typical regression MLP architecture" in the "Hands on ..." provides a good summary of typical settings for MLP

In the DeepMind paper, they stated that they also explore (using the e-greedy approach with very small e) to avoid overfitting

n-step Tree Backup approach can also be used

In overcharging, if a resource has sufficient capacity, it should not be overcharged!!!

In general, initializing the Q with large values (being more optimistic) can help for better exploration.

This paper proposed/reviewed different approaches to approximate the rho in the R-Learning "Average-reward model-free reinforcement learning"

In most practical cases, the service is instantiated and then used for a long time. The revenue is computed according to period of the usage!!! How can it be modeled?...

These are configured for provider domain 1- Quota 2- Overcharge scale 3- Overcharge quota scale demand < quota --> the normal cost quota > demand > quota * quota scale...