transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Make StaticCache configurable at model construct time

Open guangy10 opened this issue 6 months ago • 8 comments

What does this PR do?

This PR is to address #32500 for "Export to ExecuTorch"

Enable the ability to load a model with options to statically config the model using StaticCache:

model = AutoModelForCausalLM.from_pretrained(
    hf_model_repo,
    attn_implementation="sdpa",
    generation_config=GenerationConfig(
        use_cache=True,
        cache_implementation=cache_implementation,
        max_length=max_cache_len,
        cache_config={
            "batch_size": batch_size,
            "max_cache_len": max_cache_len,
        },
    ),
)

Create a new integration point for ExecuTorch at transformers/integrations/executorch.py and hosts the wrapper module class and util convert_and_export there.

The test model gemma-2b is naturally exportable via convert_and_export with StaticCache. The test model gemma-2b is also lowerable and runnable via ExecuTorch! Checkout https://github.com/pytorch/executorch/pull/4723 in ExecuTorch to repro

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

@ArthurZucker @amyeroberts @gante

guangy10 avatar Aug 15 '24 01:08 guangy10