dspy
dspy copied to clipboard
[WIP] Major refactor roadmap
DSPy has a small number (maybe 5-6) of extremely powerful concepts that have grown organically over the past year as open source.
Internally, it's time for a major refactor that will simplify things greatly and make sure everything works more smoothly. I have received a lot of interest from the community to contribute to this, so we just need to make sure the goals and milestones are clearly defined.
Potential leaders in this space include @CyrusOfEden @stalkermustang (and from the internal DSPy side possibly @krypticmouse and @arnavsinghvi11 but I haven't checked with them) and there has been a lot of in-depth shots at this from @thomasahle so I'm curious if he's also interested broadly in this.
Starting this issue just to collect the necessary tasks and prioritize them in the right dependency order.
Off the top of my head, I think we have to have:
-
Cleaner LM abstraction that requires a lot less work to maintain and is clearer about the boundaries. The amazing @CyrusOfEden has already defined this on paper. This will include cleaner "Backend" abstraction, which is a bit of a wrapper around LMs that does template adaptation and other things that Cyrus has in mind.
-
Cleaner Signature abstraction. I think the push by @thomasahle here is perfectly on the right track. Types, and immutable signatures. We just need to make more decisions about how far down Pydnatic/Types we go, and how far down, say, SGLang we go, or having our own logic, etc. I do like outsourcing parsing logic and type logic but we need to make sure it doesn't break existing features.
-
Cleaner Modules. This is actually easy but needs to be streamlined. Predict, CoT need to be Modules. And they need to store instructions (not leave it to the signature). They need to handle multiple outputs more consistently. This can be any of us really, esp @arnavsinghvi11, me, @krypticmouse , @thomasahle if any of the folks is interested.
-
Cleaner Optimizers. Well, we can leave this to me and it's a final easy step once signatures and modules are good.
-
More guidance on common patterns in the docs. We now finally have docs for all the key components and we have plenty of individual examples. But we do not have enough guidance on the common e2e workflows. This also partly includes clear guidance on what people should do with local LMs: ollama for cpu, tgi or sglang or vllm(?) for gpus? What about quantization etc.
Thoughts? Other things we need?
- Production concerns? Streaming. Async.
- Docstrings.
- Various individual issues.
Dumping list of stuff:
- Assertions now has an open issue to collect things needing improvement
- Don't fail silently if forgot kwarg passed to Predict or pass incorrect name
It would be great to have the same kind of LM abstraction for RMs.
I would create an RM class, like the existent LM class, that all the different third-party retriever models inherit from, instead of inheriting from the Retrieve module. This would allow to create different advanced retrieve techniques as modules, that would inherit from the Retrieve module which would use any RM transparently (which already does but it is confusing because the RM is another Retrieve module).
Something like ChainOfThought inheriting from Predict which uses an LM underneath.
Yes, RMs too but RMs can honestly just be function calls so it's easier for people to deal with it now.
Actually CoT and Predict should be dspy.Modules. CoT shouldn't inherit from Predict, that's a bad old decision that we'll change.
Sounds perfect! I was wondering if we can shift Example, Prediction and Completions classes to Pydantic.
Tensors are the only dtype in pytorch and so is the case for Examples an dtype but internally all that the other two do could be wrapped in a class method.
This would be a wayy big of a migration and possibly not even backwards compatible. So might wanna think on this.
I disagree with that @krypticmouse . Predictions are already Examples anyway.
I'm for using as much Pydantic as we can here
Indeed, Prediction are basically Example they do have the from_completion method that Examples don't. That doesn't make much difference yes, but yea I thought this could've become a class method of a Pydantic model.
Not a major issue tbh though, just a thought :)
Mostly just for better organization and readability.
https://github.com/stanfordnlp/dspy/issues/392
Reliably keeping within token limits when compiling and running programs, without having to set the module config for a specific LM, is big for me to be able to deploy this to production. IMO, ideally the config could stay pretty much the same if you move from a 32k context to a 8k context. You'd just recompile and it'd automatically use less or shorter demos and whatever else it needed to
My initial thoughts are that this have two main elements:
- Add something like an estimate_tokens method to LM. It'd take the same arguments as an LM call but would just return the tokens that would be used if you actually called it. Same idea as a 'what if' in infrastructure deployments, takes the same parameters but doesn't run anything just tells you what'd it'd to if you actually ran it.
- Make use of the new estimate_tokens method when compiling to stay within token limits
The distinction between the 2 elements is because it's not just for compiling that it'd be useful. When we create a module or program it'd be good to be able to estimate tokens so you can do things like limit the amount of context given by retrieval
I want to echo your point @okhat about the instructions/prompts in modules. I think right now they are a little spread out in code as strings in various places that is sometimes appended together. If that could be elevated in terms of abstractions and/or made clearer, it might even make it easier to analyse a module and potentially perform some interesting transformation on itself later down time line. I don't quite think we need to go as far as the prompting first abstractions that langchain offers but prompting is not something we can completely divorce this from, but doing so in a more organised fashion that allows for future analysis could be useful?
Integrating 4 (optimizers) in the thinking early on might be necessary, since they are what put the biggest strain on the API. We need to think about what features they require to be available, such as
- Changing the signatures
- Storing traces/examples
- Reconfiguring the lms
and so on, and find the simplest
Predictclasses etc. that satisfy those.
Assertions is another example of an "involved" feature, that needs a lot of support, but hopefully not a lot of special casing. Right now there's the whole new_signature keyword argument that gets used sometimes, and seems to be introduced specifically for Retry to use.
#368 should get merged in (or at least, its tests) before we embark on any major refactor because we ought to have tests to ensure we don't have any unintended regressions
Hey team, some notes so far:
- Backend refactor sounds great!
- Indeed, this is an interesting one.
- Can’t comment on how this is currently configured.
- Awesome, certain the team you’ve put together will come up with something interesting for this! Already super love the
BayesianSignatureOptimizer. - Ah fantastic, sorry for the delay here — will touch up on the WeaviateRM.
I like the idea of extending RMs to be more than function calls, but I do think that interfacing, for example, the Weaviate python client with the module’s forward pass probably will work fine for awhile.
Keeping within token limits sounds amazing. The LMs have an internal max_tokens state that you could probably just multiply by the upper bound of number of calls in your module’s forward passes. Compiling is another story I don’t know enough about DSPy yet to comment on.
still have a couple more responses to read, will update with mentions.
I'll try to kick off the backend refactor Saturday, if not, @isaacbmiller is down to have the first part ready by Tuesday–Wednesday of next week
Recently, Ollama released an OpenAI-compatible API. Other companies like Mistral AI also offer APIs that follow OpenAI specifications. Additionally, there are projects like LiteLLM that provide API-compatibility layers (e.g., using proxies).
So I think that LM abstraction could potentially just be a single thin wrapper around the OpenAI API specification. Is this a viable option?
@S1M0N38 you would still need the thin wrapper though to pass in optional arguments with kwargs.
@S1M0N38 @CShorten just came across LiteLLM today — and it seems like a home run for inference (not fine-tuning). Am I missing anything?
@S1M0N38 you would still need the thin wrapper though to pass in optional arguments with
kwargs.
@CShorten what are optional kwargs that differ from provider to provider that are need by DSPy? (e.g. I think temperature is one of those needed to control the amount of prompt "exploration", isn't it?). Here for example are the input params that LiteLLM support for different providers.
... just came across LiteLLM today — and it seems like a home run for inference (not fine-tuning). Am I missing anything?
@CyrusOfEden I believe you're correct, but upon examining the code in dsp/modules/[gpt3|cohere|ollama].py, it appears that the only requests being made are HTTP requests to the inference endpoint, namely /completion, chat/completion, /api/generate, api/chat, etc. These are all inference requests for text. Could you elaborate on the fine-tuning you mentioned?
I'm not entirely familiar with the inner workings and requirements of this framework, so everything I've mentioned may not be feasible. Therefore, please take my statements with a grain of salt. In my opinion, for a project like this, it's best to focus on core concepts rather than implementing numerous features. The idea is to defer those to other libraries or leave them to the user to implement, guided by high-quality documentation.
@S1M0N38 I think the way multiple generations are sampled -- for example Cohere has num_generations but the google.generativeai API has no such option. Probably little nuances like this, but the chart you shared is great, I trust your judgment on this.
@S1M0N38 yup, LiteLLM would be good for inference — and currently LMs don't have a .finetune method but we want that later.
I'm new to this library, I'd love to see more support for production deployment, my prioritized wish list would be:
- Deployment tools/docs (clarity for: how does this fit into your CI/CD?)
- Async
- Type annotations
- Doc strings
- Streaming
@bencrouse those are definitely on the roadmap — I think the focus right now is reliability / API / typed outputs / everything Omar originally mentioned and then afterwards we want to do some thinking through what it means to productionize this (async/streaming/deployment/etc.)
+1 For supporting all OpenAI compatible local LLM servers for inferencing and not just Ollama. I think this will increase adoption because a lot of "application developers" of LMs who are not ML experts use tools like LM Studio, GPT4All, etc.
Hi I'm the litellm maintainer - what's missing in litellm to start using it in this repo ? @CyrusOfEden @S1M0N38
Happy to help with any issues / feature requests - even minor ones
Hi I'm the litellm maintainer - what's missing in litellm to start using it in this repo ? @CyrusOfEden @S1M0N38
Happy to help with an issues / feature requests - even minor ones
Tbh I think you're good for now, great to have your support 🔥
@ishaan-jaff how does tool use work with models that don't necessarily support it?
Would be really cool if I could use LiteLLM for tool use for whatever model -- is there a table for tool use support somewhere?
Separately, is LiteLLM able to integrate something like Outlines to support tool use for models that don't natively support it?
Interesting question, I suspect digging into how dspy.ReAct implements the dspy.Retrieve tool could be a good start to understanding how to interface all tools @CyrusOfEden.
Maybe this is the argument for why these tools should be integrated deeper into DSPy than externally used as calls to the forward pass (or there is some kind of API contract with the ReAct for passing in arbitrary functions as well).
Yeah I’m in favor of minimal abstractions around tools. I think a tool being just a function whose docstring and arguments and name can be used by ReAct can achieve this
The coupling between DSPy and "tools", in the ReAct sense, should be as light as possible in my opinion. There could be some annotations that could be made that makes optimisation easier, however; for instance, to enable backtracking with it.
More generally, I feel like some of the coupling with other technologies like RMs may be a little too strong at the moment from a software engineering perspective, which I understand the reason behind as it unlocks a lot of cool things like retrieveEnsemble, but it does feel a little specific at the moment.
My vague intuition is that it would be great if DSPy could have a more generic boundaries and well defined boundaries when it comes to the backend components(hopefully backed by classes and types and less so magic strings/attributes). This comment might betrays more of a personal disperference against "duck" typing, but regardless perhaps a more defined internal operating mechanics/schema could make future development of passes and features a lot less burdensome.
Yeah I’m in favor of minimal abstractions around tools. I think a tool being just a function whose docstring and arguments and name can be used by ReAct can achieve this
LiteLLM on it — really liking what I'm finding in this repo — they have a util for converting from a Python function to a JSON schema tool def [0]
[0] https://litellm.vercel.app/docs/completion/function_call#litellmfunction_to_dict---convert-functions-to-dictionary-for-openai-function-calling