mamba icon indicating copy to clipboard operation
mamba copied to clipboard

Inference multiple tokens

Open PlayerSAL opened this issue 1 year ago • 2 comments
trafficstars

Thank you for your great work! I have a question about inference with multiple tokens. For instance, I have 10k tokens to inference and I want to process them in batches. Every time handle 10 tokens and process for 1000 times. Is it possible to make mamba work this way?

PlayerSAL avatar Jul 17 '24 07:07 PlayerSAL

Probably yes. How would you do it with Transformers?

tridao avatar Jul 17 '24 18:07 tridao

I already implemented decoding N tokens in a batch instead of one token every time. @PlayerSAL https://github.com/state-spaces/mamba/pull/477 @tridao

AnaRhisT94 avatar Jul 18 '24 12:07 AnaRhisT94