dspy
dspy copied to clipboard
What's the best way to use Pandas in Program of Thought
I want to build an agent to answer questions using data stored in a pandas data-frame (similar to langchain's data-frame Agent but customised to my needs).
I tried different ways to do it but I am hitting a wall with errors.
A minimum reproducible example is this:
import dspy
lm = dspy.AzureOpenAI(...)
dspy.settings.configure(lm=lm)
pot = dspy.ProgramOfThought("question, data -> answer")
pot(question="Calculate the sum of column X", data=pd.DataFrame({"X": [1, 2, 3]}))
which returns this error
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
It seems the error is in this line of the code but I cannot tell why (I also tried customised signatures but same error):
if not is_demo:
has_value = [
field.input_variable in example
and example[field.input_variable] is not None
and example[field.input_variable] != ""
for field in self.fields
]
After a few trial and error, I arrived to this working code:
cot = dspy.TypedChainOfThought("question:str, data:list -> answer:str")
cot(question="Calculate the sum of column X", data=pd.DataFrame({"X": [1, 2, 3]}))
which gives the correct answer but feels hacky and is not exactly what I need: I want to use PoT instead of CoT.
Is Pandas supported in dspy? If yes, what's the recommended way to use it?