dspy Backend Refactor [WIP]

Feb 21 '24 01:02 CyrusNuevoDia

Where does the responsibility sit for Parsing the Language Model output to the Signature compliant Structured Output?

It is my understanding that the BaseBackend would be compatible for the following:

Given the Signature, Demos, and the Signature Inputs. Generate the Examples, Templates before passing to the Language Model and returning Signature Compliant output.

However, as any given language model can return the following:

Tool Use/JSON Format
Raw Completion Strings

Would we expect the BaseLM to have individual functions for JSON/String outputs, which would then be called/parsed as necessary by the BaseBackend? Or would we expect the responsibility for returning Signature Compliant Signature Output sits entirely at the BaseLM level?

Feb 27 '24 16:02 KCaverly

Where does the responsibility sit for Parsing the Language Model output to the Signature compliant Structured Output?

It is my understanding that the BaseBackend would be compatible for the following:

Given the Signature, Demos, and the Signature Inputs. Generate the Examples, Templates before passing to the Language Model and returning Signature Compliant output.

However, as any given language model can return the following:

Tool Use/JSON Format

Raw Completion Strings

Would we expect the BaseLM to have individual functions for JSON/String outputs, which would then be called/parsed as necessary by the BaseBackend? Or would we expect the responsibility for returning Signature Compliant Signature Output sits entirely at the BaseLM level?

Not really, in essence —

class ToolBackend(BaseBackend):
  def __call__(...) -> list[dict[str, Any]]:
    return self.lm(..., tools=[...], tool_choice=...)

class JSONBackend(BaseBackend):
  def __call__(...):
    return self.lm(..., response_format={"mode": "json"})

class TemplateBackend(BaseBackend):
  def __call__(...):
    template = ...
    response = self.lm(...)
    results = [template.extract(...) for choice in response.choices]
    return results

Feb 27 '24 19:02 CyrusNuevoDia

One thing we may have to tackle, is Template variations on a potentially per backend basis.

Ie. For the JSONBackend, the guidelines has to be augmented to accommodate for json specific instructions.

One option, would be to abstract away Template, have a PromptTemplate, which just leverages Text generation, and a JSONTemplate which accomodates for JSON specific instructions, and knows how to extract specifically for JSON output.

Currently, I've just got one Template, with an argument for is_json, but I could see some value for introducing more Templates, would also allow the users to override all template functionality if they want to. You would simply pass a Template to the `Backend.

Mar 07 '24 15:03 KCaverly

I just merged the main into the branch but seems like lots of conflicts, I resolved most of them based on my own knowledge but please take another look at the them.

I have fixed the lint issues but pass the test fixing to you, there should be some new tests that may fail.

Apr 10 '24 06:04 ammirsm

Okay - This guy is good to merge.

Note on Breaking Changes:

This should be entirely backwards compatible with main. I've run a series of the example notebooks to ensure compatibility, along with passing our entire test suite. All of your existing DSPy programs should continue to operate in the same way. I would test this the same way we have as opposed to building directly with the new backends.

How the New Backends Work:

The external API for the Backends, looks like the following:

class BaseBackend:
    def prepare_request(self, signature: Signature, example: Example, ...) -> dict:
        ...
        
    def generate_request(self, **kwargs) -> t.Any:
        ...
        
    def process_response(self, signature: Signature, example: Example, response: t.Any, ...) -> Completions:
        ...

Providing this interface to the user, allows the user to fully override all arguments going to the LM, and allows for dynamic extraction based on the prompt etc. We went through a lot of iterations trying to change this to be more general. Unfortunately the extraction process (generated content -> arguments) is still highly tied to the prompt provided, so we've started with a general API and hopefully something more efficient may emerge as the space evolves. To start, we've provided two backends, a JSONBackend and a TextBackend. The TextBackend operates the same as the current LMs, with a generation prompt and templated extraction. The JSONBackend is net new, leveraging the JSON mode inherent in LiteLLM models to provide JSON directly.

The Backend internals also allow for "healing" ie, retrying with partially complete generations, along with caching. 100% of all new backend calls should be cached, without any additional maintenance needed from the team.

Where should we focus our review:

Predict Module

The way this is built, if you have configured a Backend, Predict will leverage the backend for generation, otherwise it will fall back on the LM.
When using the LM, the functionality should be no different.
Additionally, when using the TextBackend and the LM, performance should be basically the same, as the prompt format and generation methods are basically identical.

Prediction/Completions Primitives

While building this out, we found a series of inconsistencies with how Completion objects were constructed. Without diving too much in detail, it was possible for two fields in the same generation to be provided in two seperate "generated" responses.
All the external APIs should be fairly similar (and have been updated) with tests passing, this was primarily meant to tighten up the internal consistency.
For optimizers in which the internal Completions state is manipulated (COPRO/MIPRO, Boostrap) we may want an extra pair of eyes. But all my tests look great.

Overall

This PR is getting Mega, and we've just been spending cycles catching up with main, so I would propose we review/merge, and then continue to harden and tweak once this guy is in.

Thanks all for the help on this - please let me know if we have any questions/concerns.

Apr 17 '24 19:04 KCaverly

I’ve been in the loop with this through dev and it looks good to me. I think the broad brush strokes are there and agree with you in that we should merge to main and then do future PRs to refine.

My proposal is to operate optimistically here — unless there are any objections from @okhat or @thomasahle within the next 48h then we merge.

Do we have a version bump / process to publish a new version? @okhat are you the one that manually publishes new versions?

Apr 17 '24 21:04 CyrusNuevoDia

@okhat @arnavsinghvi11 are we good to merge this one?

Apr 25 '24 17:04 KCaverly

Hey everyone, just catching up on this. Let's prioritize merging very soon (let's say within 2 days).

Biggest problem right now is that I gather (but not sure) that 90% of the files changed are just formatting. Is this accurate? It makes it harder to review this PR. Do we need these formatting changes? Isn't ruff supposed to take that out of the story? Basically I'd love to review a version of this PR that's mostly or entirely actual updates so nothing slips.

Apr 28 '24 14:04 okhat

That is correct, we did our best but its messy. Without rewriting this PR from scratch, I dont think we can make it cleaner. The repo went through so many major changes during this period and the merges got a bit clobbered.

Ive outlined the three areas to review above. It should just be the modeling/predict and primitives/prediction modules. Ive extensively tested this and it should be fine.

I would love to see this merged within 48 hours. Its not perfect, and there are additional issues we can fix on follow up, but existing functionality remains unchanged.

With CI and consistent formatting, I think all future changes will be much better scoped and focused.

Please let me know whats needed to get this over the line.

Apr 28 '24 15:04 KCaverly

Hi @okhat any update on when this can be merged? We use Instructor for a few things, and I'd like to create an InstructorBackend for ourselves instead of creating a new TypedInstructorPredictor and TypedInstructorChainOfThought

May 16 '24 21:05 samiur

dspy dspy copied to clipboard

Backend Refactor [WIP]

Note on Breaking Changes:

How the New Backends Work:

Where should we focus our review:

Overall

dspy
dspy copied to clipboard