Currently the compiled programs are not async and hence are not efficient to serve using a python server. It would be useful to merge the PRs aiming to add async across the dspy library.

This could also involve adding nurseries in order to await an ensemble of requests simultaneously.

Dec 17 '23 10:12 sutyum

Thanks @sutyum. Is the main target here serving queries in parallel?

Currently we do this with threading; DSPy is thread-safe. Does async offer additional benefits for you?

Dec 18 '23 14:12 okhat

Serving programs

Threads vs Asyncio

Given that LM programs spend most of their execution time waiting for responses from other machines. They are IO heavy rather than being compute heavy. Async IO tends to perform particularly better in scenarios where a large chunk of execution time is spent waiting. Rather than busy waiting the async executor can carry out other tasks in the meantime. There are also limits on how many OS threads can be created on a given CPU, which is far less than the requests one could serve if each request only initiated a green thread (asyncio).

Compilation vs Execution

Also worth considering the case of LM programs that run very long (hours, days, months) - such sort of scenarios would benefit from other forms of distributed execution. For instance, an agent orchestration project - SuperAGI uses a message broker to break apart the LM call DAG into a workflow with each call happening in a distributed manner.

Bring your own executor

It seems we are still on a look out for an flexible execution model for such compile programs. Just putting my thoughts here in order to continue discussion on this open question.

Do we need this? :

flowchart LR

D[Dspy program]  --> C[DAG of compiled programs]
C --> E[Bring your own executor]

One idea that was brought up recently was to compile with dspy and execute with a separate executor system (such as langchain). This sort of approach could be useful to keep dspy focused on LM programming primitives and constructs rather than the various choices one can make for execution.

Dec 25 '23 08:12 sutyum

This sounds super cool, similar to #338 I think the broader question is "how does DSPy fit into the productionization workflow" and something we can think more about to come up with an elegant approach.

Feb 12 '24 23:02 CyrusNuevoDia

Posting here so I get notified of updates. I'd be interested in getting compilation to run on something like Hamilton.

Apr 08 '24 23:04 skrawcz

@CyrusOfEden Could sglang as the executor be all that we need?

Apr 09 '24 03:04 sutyum

@sutyum how do you imagine that working?

Apr 09 '24 03:04 CyrusNuevoDia

@CyrusOfEden Could sglang as the executor be all that we need?

That doesn't sound all that useful to me.

Deploying a "compiled" dspy program to me requires publishing a graph comprised of the optimized prompts generated. Then you can take that and convert it into whatever framework you want.

Apr 09 '24 05:04 skrawcz

@CyrusOfEden Could sglang as the executor be all that we need?

That doesn't sound all that useful to me.

Deploying a "compiled" dspy program to me requires publishing a graph comprised of the optimized prompts generated. Then you can take that and convert it into whatever framework you want.

so you just take compiled prompts from dspy and run them via for example openai lib ?

Apr 24 '24 07:04 williambrach

@CyrusOfEden Could sglang as the executor be all that we need?

That doesn't sound all that useful to me. Deploying a "compiled" dspy program to me requires publishing a graph comprised of the optimized prompts generated. Then you can take that and convert it into whatever framework you want.

so you just take compiled prompts from dspy and run them via for example openai lib ?

As a first target that would be great!

Apr 24 '24 19:04 skrawcz

so langchain is the way forward. also when we say a graph how about using this https://topoteretes.github.io/cognee/




               cognee
Deterministic LLMs Outputs for AI Engineers
Open-source framework for loading and structuring 
LLM context to create accurate and explainable AI 
solutions using knowledge graphs and vector stores

May 19 '24 00:05 jmanhype

Serving programs

Threads vs Asyncio

Given that LM programs spend most of their execution time waiting for responses from other machines. They are IO heavy rather than being compute heavy. Async IO tends to perform particularly better in scenarios where a large chunk of execution time is spent waiting. Rather than busy waiting the async executor can carry out other tasks in the meantime. There are also limits on how many OS threads can be created on a given CPU, which is far less than the requests one could serve if each request only initiated a green thread (asyncio).

Compilation vs Execution

Also worth considering the case of LM programs that run very long (hours, days, months) - such sort of scenarios would benefit from other forms of distributed execution. For instance, an agent orchestration project - SuperAGI uses a message broker to break apart the LM call DAG into a workflow with each call happening in a distributed manner.

Bring your own executor

It seems we are still on a look out for an flexible execution model for such compile programs. Just putting my thoughts here in order to continue discussion on this open question.

Do we need this? :
flowchart LR

D[Dspy program]  --> C[DAG of compiled programs]
C --> E[Bring your own executor]
One idea that was brought up recently was to compile with dspy and execute with a separate executor system (such as langchain). This sort of approach could be useful to keep dspy focused on LM programming primitives and constructs rather than the various choices one can make for execution.

Here's how LangChainPredict and LangChainModule could be enhanced to support streaming and tracing:

Streaming Support

class LangChainPredict(Predict):
    def forward(self, **kwargs):
        stream_output = kwargs.pop("stream_output", False)
        
        if stream_output:
            # Pass streaming flag to LangChain
            output = self.langchain_llm.invoke(prompt, streaming=True)
            return StreamedPrediction(output, signature=signature)
        else:
            output = self.langchain_llm.invoke(prompt)
            return Prediction.from_completions(output, signature=signature)

Changes:

Add stream_output argument to forward()
Pass streaming=True to LangChain LLM when stream_output is set
Return StreamedPrediction instead of Prediction for streaming output

Tracing Support

class LangChainPredict(Predict):
    def forward(self, **kwargs):
        enable_tracing = kwargs.pop("enable_tracing", False)
        
        if enable_tracing:
            # Enable tracing in LangChain
            self.langchain_llm.set_tracing(True)
        
        output = self.langchain_llm.invoke(prompt)
        
        if enable_tracing:
            # Access and log the trace
            trace = self.langchain_llm.get_trace()
            logger.debug(f"LangChain Trace: {trace}")
        
        return Prediction.from_completions(output, signature=signature)

Changes:

Add enable_tracing argument to forward()
Enable tracing on LangChain LLM when enable_tracing is set
Retrieve and log the trace after invoking the LLM

The LangChainModule class can expose these same options and pass them through to its underlying LangChainPredict instances.

With these enhancements, DSPy programs using LangChain components will be able to leverage streaming and tracing capabilities, enabling better observability and interactivity in production deployments.

May 19 '24 00:05 jmanhype

so langchain is the way forward

I think it's just 'a' way. not the way. ;)

May 19 '24 03:05 skrawcz

@CyrusOfEden Could sglang as the executor be all that we need?

That doesn't sound all that useful to me. Deploying a "compiled" dspy program to me requires publishing a graph comprised of the optimized prompts generated. Then you can take that and convert it into whatever framework you want.

so you just take compiled prompts from dspy and run them via for example openai lib ?

Dspy supports several other objects in its graphs which I think makes this a little more tricky. How do you encapsulate a retrieval model in the compiled prompts, for example?

Jun 05 '24 07:06 mikeedjones

Posting here so I get notified on updates! I would love if we could get compilation running on something like Dagster

Aug 05 '24 15:08 sarora-roivant

@sarora-roivant wanna send me a DM on LinkedIn? URL in bio

Aug 05 '24 22:08 CyrusNuevoDia

@sarora-roivant wanna send me a DM on LinkedIn? URL in bio Just sent you a connection request

Aug 05 '24 22:08 sarora-roivant

dspy
dspy copied to clipboard

Deployment of a compiled program

Serving programs

Threads vs Asyncio

Compilation vs Execution

Bring your own executor

Serving programs

Threads vs Asyncio

Compilation vs Execution

Bring your own executor

Streaming Support

Tracing Support

dspy dspy copied to clipboard

Deployment of a compiled program

Serving programs

Threads vs Asyncio

Compilation vs Execution

Bring your own executor

Serving programs

Threads vs Asyncio

Compilation vs Execution

Bring your own executor

Streaming Support

Tracing Support

dspy
dspy copied to clipboard