RWKV-LM
RWKV-LM copied to clipboard
Meet in the middle type of training
It would be interesting to see if the new paper from Microsoft (https://arxiv.org/pdf/2303.07295.pdf) would have the same positive impact for RWKV. I don't see why not.
Is this something in the pipeline?