Eryk Rakowski

Results 56 comments of Eryk Rakowski
trafficstars

@tridao Does Mamba support passing state between multiple forward passes (or blocks of tokens) during training?

@tridao what would it take to support passing state between forward passes? I can see it's possible to do this via inference_params, where's the catch?

@albertfgu Would states be differentiable? I would rather like to not stop the gradient during training unlike @robflynnyh

I'm thinking of backpropagation through time on multiple chunks (e.g. 4) instead of fitting full sequence in one huge window

Finally someone did it! Does it also support negative bias (i.e mask)?