no input injection to H-module
Hi, I’m really impressed by your work, and I have a quick question about the HRM network. I’d love to understand more about the motivation behind one particular design choice.
In your model, the input embedding is only added to the L-module branch and not to the H-module branch. I’m wondering if there is any particular reason for this design choice. Have you found through experiments that injecting the embedding into the H-module degrades performance?
Lower level details end up confusing the slow strategic module, no? I can't imagine the CEO reviewing every single line of code his engineering team generates.
Makes sense
Also, I suspect this V1 is focused on simplicity