mamba
mamba copied to clipboard
Inference multiple tokens
trafficstars
Thank you for your great work! I have a question about inference with multiple tokens. For instance, I have 10k tokens to inference and I want to process them in batches. Every time handle 10 tokens and process for 1000 times. Is it possible to make mamba work this way?
Probably yes. How would you do it with Transformers?
I already implemented decoding N tokens in a batch instead of one token every time. @PlayerSAL https://github.com/state-spaces/mamba/pull/477 @tridao