Value net and memory blocks
Hey!
I have been studying your paper and trying to get inspiration for my current DRL projects. I did not find the transformer you mention in the paper to add memory to the model. I found a paper from you called "STABILIZING TRANSFORMERS FOR REINFORCEMENT LEARNING" where you mention transformers for RL are too unstable to learn and propose some updates to the original arch. I am facing this problem on my projects and I was curious about your approach to solve this issue in this alphastar implementation! Did you use the approach of the paper?
On the other hand, I did not find the value network in the model. Does your value net need a complex and large MLP? Do you use any residual blocks?
Hope we can discuss this topics. You can reach me out if you prefer at [email protected]. Having the oportunity to see and learn from this alphastar implementation is really cool!
Regards,
Rodrigo
Hi have you received any response? I am facing exactly the same problem.