dspy
dspy copied to clipboard
Is there a generic way to change model's `max_tokens` dynamically?
I have a use case where I want to use the same LM in different steps with different max_tokens
and context size. I use dspy.configure
to set this LM.
Some of my steps will require much shorter output, so it would be useful to do something like this:
with dspy.context(max_tokens=32):
result = module1(first_input)
with dspy.context(max_tokens=128):
result = module2(first_input)
Is it currently possible to simulate this? If yes, how can I specify pass this to lm's kwargs
?
Obviously one could do this by resetting dspy.context(lm=...)
but it would be pretty cumbersome, it also seems for me that the point of dspy.configure
to set LM is that the modules should abstract over what LM is actually used (but setting a key parameter like the context length or number of generated tokens seems reasonable).
I believe there is a context manager for this.
with dspy.settings.context(max_tokens=32):
result = module1(first_input)
with dspy.settings.context(max_tokens=128):
result = module2(first_input)
@lambdaofgod you can set the LM per step through the context manager and dynamically change the respective LM's tokens as needed:
with dspy.context(lm=dspy.OpenAI(...., max_tokens= x)):
....
with dspy.context(lm=dspy.OpenAI(...., max_tokens= x+1)):
....
@kylerush that doesn't work generically
@arnavsinghvi11 the whole point is to not do this; setting an LLM in a module is leaking implementation details which hits reusability - what if someone wants to use my module but with his own LM? It seems reasonable to assume that DSPy module could abstract over LM while exposing some crucial generation kwargs.
Ok so basically I wrote something like this and it seems to work for LMs that will have kwargs
field
from contextlib import contextmanager
@contextmanager
def override_lm_params(**kwargs):
lm = dspy.settings.lm
old_kwargs = {param_name: lm.kwargs[param_name] for param_name in kwargs.keys()}
try:
for param_name, param_value in kwargs.items():
lm.kwargs[param_name] = param_value
yield
finally:
for param_name, param_value in old_kwargs.items():
lm.kwargs[param_name] = param_value
When I run it it correctly sets the parameters in contexmanager and then resets them upon closing.
Is this what dspy.context
was supposed to do? If not, doesn't it seem useful to add such a feature?
Tests
my contextmanager
with override_lm_params(max_tokens=512):
print("kwargs in contextmanager")
print(dspy.settings.lm.kwargs["max_tokens"])
print("kwargs after contextmanager")
print(dspy.settings.lm.kwargs["max_tokens"])
kwargs in contextmanager 512 kwargs after contextmanager 1024
dspy.context
with dspy.context(max_tokens=512):
print("kwargs in contextmanager")
print(dspy.settings.lm.kwargs["max_tokens"])
print("kwargs after contextmanager")
print(dspy.settings.lm.kwargs["max_tokens"])
kwargs in contextmanager 1024 kwargs after contextmanager 1024
@lambdaofgod would dspy.settings.config['lm'].kwargs['max_tokens'] = ..
cover the abstraction here?
you wouldn't have to expose the defined lm and can still wrap modules using with dspy.context(lm = lm)
lm = dspy.OpenAI(model=..., max_tokens=32)
dspy.settings.configure(lm=lm)
with dspy.settings.context(lm= lm):
result = module1(first_input)
dspy.settings.config['lm'].kwargs['max_tokens'] = 128
with dspy.settings.context(lm= lm):
result = module2(first_input)
@arnavsinghvi11 what's the difference between this and my context manager?
@lambdaofgod They are quite similar, just that the reference I provided is already built in with DSPy and wouldn't require any additional changes :).
That's the question - maybe someone else would find this util helpful? Or should we roll this into with dspy.context
?
I was confused that dspy.context
doesn't work this way.
Makes sense @lambdaofgod . Feel free to push a PR that can better handle this behavior in dspy.context
without impacting existing settings!
I've tried this with two models and actually it's far from obvious how overwriting parameters should work.
The problem is that for example ollama reloads a model on the server when we change the context size. I think it would be actually better to just tell the users to run two models because my proposal could potentially result in more confusion (as whether a model is reloaded depends on whether that's actually a model or just a client).