mamba icon indicating copy to clipboard operation
mamba copied to clipboard

Add documentation/tests on how to use inference_params to Mamba to generate sequences by parts

Open Maykeye opened this issue 1 year ago • 4 comments

Right now there are no tests / doc on how to continue generation with a given state.

I think I figured it out: at least generating by parts and by whole works, though I'm not 100% sure: (seqlen_offset value is not used by Mamba class. only that it >0; max_batch_size is not even mentioned in class), so if the test is not completely bogus, I can wrap it in pytest and make PR)

Maykeye avatar Jan 30 '24 11:01 Maykeye

Can you explain the use case here. Would this be like if the model is handling topic a, we're using and updating state a for each inference?

Eupham avatar Feb 01 '24 14:02 Eupham

Can you explain the use case here. Would this be like if the model is handling topic a, we're using and updating state a for each inference?

Yes, manual cache handling over long sequence of text and long period of time

E.g. "Chapter N: previous context goes here. State at this point should be cached for days and stored to disk to not reparse things.\n\n Chapter N+1: (lots of long text being rewritten)"

Maykeye avatar Feb 02 '24 06:02 Maykeye

Just to be sure this is for a different use case than using cg=True for generate? Like in generate which is defined by DecodingCGCache and capture_graph in generation https://github.com/state-spaces/mamba/blob/main/mamba_ssm/utils/generation.py

I'm going to try this out on a handbook or 2 and see how it does.

Eupham avatar Feb 02 '24 21:02 Eupham

Similar. It's about the manual control over the every aspect of cache (and hence state) for model. The model itself uses InferenceParms.

Maykeye avatar Feb 03 '24 11:02 Maykeye