mamba Inference multiple tokens

Inference multiple tokens

Open PlayerSAL opened this issue 1 year ago • 2 comments

trafficstars

Thank you for your great work! I have a question about inference with multiple tokens. For instance, I have 10k tokens to inference and I want to process them in batches. Every time handle 10 tokens and process for 1000 times. Is it possible to make mamba work this way?

Jul 17 '24 07:07 PlayerSAL

Probably yes. How would you do it with Transformers?

Jul 17 '24 18:07 tridao

I already implemented decoding N tokens in a batch instead of one token every time. @PlayerSAL https://github.com/state-spaces/mamba/pull/477 @tridao

Jul 18 '24 12:07 AnaRhisT94

mamba mamba copied to clipboard

Inference multiple tokens

mamba
mamba copied to clipboard