MarkYang
Results
1
comments of
MarkYang
@hxs91 My hypothesis is that FoT is using a similar training strategy to Recurrent Memory Transformer, if you want to train a local context of 2k with 4 segments, you...