block-recurrent-transformer
block-recurrent-transformer copied to clipboard
What is the learned state IDs, is it just a trainable tensor of weights? I'm currently trying to implement it and it will help me a lot