dspy icon indicating copy to clipboard operation
dspy copied to clipboard

Is there a generic way to change model's `max_tokens` dynamically?

Open lambdaofgod opened this issue 4 months ago • 9 comments

I have a use case where I want to use the same LM in different steps with different max_tokens and context size. I use dspy.configure to set this LM.

Some of my steps will require much shorter output, so it would be useful to do something like this:

with dspy.context(max_tokens=32):
    result = module1(first_input)

with dspy.context(max_tokens=128):
   result = module2(first_input)

Is it currently possible to simulate this? If yes, how can I specify pass this to lm's kwargs?

Obviously one could do this by resetting dspy.context(lm=...) but it would be pretty cumbersome, it also seems for me that the point of dspy.configure to set LM is that the modules should abstract over what LM is actually used (but setting a key parameter like the context length or number of generated tokens seems reasonable).

lambdaofgod avatar Apr 10 '24 18:04 lambdaofgod

I believe there is a context manager for this.

with dspy.settings.context(max_tokens=32):
    result = module1(first_input)

with dspy.settings.context(max_tokens=128):
   result = module2(first_input)

kylerush avatar Apr 10 '24 22:04 kylerush

@lambdaofgod you can set the LM per step through the context manager and dynamically change the respective LM's tokens as needed:

with dspy.context(lm=dspy.OpenAI(...., max_tokens= x)):
  ....
with dspy.context(lm=dspy.OpenAI(...., max_tokens= x+1)):
  ....

arnavsinghvi11 avatar Apr 11 '24 06:04 arnavsinghvi11

@kylerush that doesn't work generically

@arnavsinghvi11 the whole point is to not do this; setting an LLM in a module is leaking implementation details which hits reusability - what if someone wants to use my module but with his own LM? It seems reasonable to assume that DSPy module could abstract over LM while exposing some crucial generation kwargs.

lambdaofgod avatar Apr 11 '24 09:04 lambdaofgod

Ok so basically I wrote something like this and it seems to work for LMs that will have kwargs field

from contextlib import contextmanager

@contextmanager
def override_lm_params(**kwargs):
    lm = dspy.settings.lm
    old_kwargs = {param_name: lm.kwargs[param_name] for param_name in kwargs.keys()}
    try:
        for param_name, param_value in kwargs.items():
            lm.kwargs[param_name] = param_value
        yield
    finally:
        for param_name, param_value in old_kwargs.items():
            lm.kwargs[param_name] = param_value

When I run it it correctly sets the parameters in contexmanager and then resets them upon closing.

Is this what dspy.context was supposed to do? If not, doesn't it seem useful to add such a feature?

Tests

my contextmanager

with override_lm_params(max_tokens=512):
    print("kwargs in contextmanager")
    print(dspy.settings.lm.kwargs["max_tokens"])

print("kwargs after contextmanager")
print(dspy.settings.lm.kwargs["max_tokens"])

kwargs in contextmanager 512 kwargs after contextmanager 1024

dspy.context

with dspy.context(max_tokens=512):
    print("kwargs in contextmanager")
    print(dspy.settings.lm.kwargs["max_tokens"])

print("kwargs after contextmanager")
print(dspy.settings.lm.kwargs["max_tokens"])

kwargs in contextmanager 1024 kwargs after contextmanager 1024

lambdaofgod avatar Apr 11 '24 15:04 lambdaofgod

@lambdaofgod would dspy.settings.config['lm'].kwargs['max_tokens'] = .. cover the abstraction here?

you wouldn't have to expose the defined lm and can still wrap modules using with dspy.context(lm = lm)

lm = dspy.OpenAI(model=..., max_tokens=32)

dspy.settings.configure(lm=lm)

with dspy.settings.context(lm= lm):
    result = module1(first_input)

dspy.settings.config['lm'].kwargs['max_tokens'] = 128

with dspy.settings.context(lm= lm):
   result = module2(first_input)

arnavsinghvi11 avatar Apr 11 '24 17:04 arnavsinghvi11

@arnavsinghvi11 what's the difference between this and my context manager?

lambdaofgod avatar Apr 14 '24 10:04 lambdaofgod

@lambdaofgod They are quite similar, just that the reference I provided is already built in with DSPy and wouldn't require any additional changes :).

arnavsinghvi11 avatar Apr 18 '24 16:04 arnavsinghvi11

That's the question - maybe someone else would find this util helpful? Or should we roll this into with dspy.context?

I was confused that dspy.context doesn't work this way.

lambdaofgod avatar Apr 20 '24 15:04 lambdaofgod

Makes sense @lambdaofgod . Feel free to push a PR that can better handle this behavior in dspy.context without impacting existing settings!

arnavsinghvi11 avatar Apr 27 '24 17:04 arnavsinghvi11

I've tried this with two models and actually it's far from obvious how overwriting parameters should work.

The problem is that for example ollama reloads a model on the server when we change the context size. I think it would be actually better to just tell the users to run two models because my proposal could potentially result in more confusion (as whether a model is reloaded depends on whether that's actually a model or just a client).

lambdaofgod avatar Apr 29 '24 21:04 lambdaofgod