NestedFormer
NestedFormer copied to clipboard
Understanding TokenLearner
I am trying to understand the tokenlearner and I guess the current implementation is different from the original TokenLearner proposed right ? I guess the original Tokenlearner performs the token learner for each temporal dimension and then reshape back.
In this work, token learner only controls the sequence length. It doesn't need to reshape back because the network has a segmentation decoder.