mfrl icon indicating copy to clipboard operation
mfrl copied to clipboard

How to calculate MF-Value (eq(10)) in MF-AC/MF-Q

Open rezunli96 opened this issue 5 years ago • 1 comments

Hi, recently I am trying to reproduce your work and feel a little confused when implementing MF-AC. According to the algorithm at somewhere the MF-Value (10) should be calculated, where it seems it involves many computations to enumerate all possible mean-field actions and their probabilities. I took a look at you MF-AC implementation in battle-game, but it appears to me (please correct me if i am wrong) here the MF-values are substituted with the returns from the sampled trajectory? Could you explain more about how to calculate the MF-value eq(10), for both MF-AC and MF-Q? Thanks

rezunli96 avatar Mar 14 '19 16:03 rezunli96

It just occurred to me that the sampled trajectory is an unbiased estimator of the MF-Value? It works for REINFORCE-like AC. But still confused how to calculated for off-policy RL like MF-Q?

rezunli96 avatar Mar 14 '19 16:03 rezunli96