Multi-Agent-Transformer icon indicating copy to clipboard operation
Multi-Agent-Transformer copied to clipboard

Question about the monotonic improvement guarantee of MAT.

Open CrazySssst opened this issue 9 months ago • 0 comments

Very great work!

I am very interest why MAT can hold the monotonic improvement guarantee while avoids sequential updates.

To guarantee the monotonic improvement, HAPPO updates each policy one-by-one during training, by leveraging previous update results. That means if we want to update ${\pi}^2_{old}$, we have to wait ${\pi}^1_{new}$.

There is only a rough discussion about this issue in the paper: image

After careful checking the HAPPO paper, I found MAT's Eq 5 is not the same as Eq 11 in HAPPO paper. Specifically, MAT's Eq 5 ignores the first term of $M^{i_{1:m}}$ which depends on previous update results, e.g., ${\pi}^1_{new}$.

Can you explain why Eq.5 can guarantee monotonic improvement ?

This question has been bothering me for a long time and I look forward to getting your reply.

image

image

image

CrazySssst avatar May 24 '24 05:05 CrazySssst