dspy icon indicating copy to clipboard operation
dspy copied to clipboard

What's the best way to use Pandas in Program of Thought

Open giresg opened this issue 9 months ago • 0 comments

I want to build an agent to answer questions using data stored in a pandas data-frame (similar to langchain's data-frame Agent but customised to my needs).

I tried different ways to do it but I am hitting a wall with errors.

A minimum reproducible example is this:

import dspy

lm = dspy.AzureOpenAI(...)
dspy.settings.configure(lm=lm)

pot = dspy.ProgramOfThought("question, data -> answer")

pot(question="Calculate the sum of column X", data=pd.DataFrame({"X": [1, 2, 3]}))

which returns this error

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

It seems the error is in this line of the code but I cannot tell why (I also tried customised signatures but same error):

if not is_demo:
    has_value = [
        field.input_variable in example
        and example[field.input_variable] is not None
        and example[field.input_variable] != ""
        for field in self.fields
    ]

After a few trial and error, I arrived to this working code:

cot = dspy.TypedChainOfThought("question:str, data:list -> answer:str")

cot(question="Calculate the sum of column X", data=pd.DataFrame({"X": [1, 2, 3]}))

which gives the correct answer but feels hacky and is not exactly what I need: I want to use PoT instead of CoT.

Is Pandas supported in dspy? If yes, what's the recommended way to use it?

giresg avatar May 10 '24 06:05 giresg