langchain
langchain copied to clipboard
CPAL
Causal program-aided language (CPAL) chain
Motivation
The problem with the PAL approach is that it hallucinates on a math problem with a nested chain of dependence. The CPAL chain includes causal structure to fix the hallucination.
For example, using the below word problem, PAL answers with 5, and CPAL answers with 13.
"Tim buys the same number of pets as Cindy and Boris."
"Cindy buys the same number of pets as Bill plus Bob."
"Boris buys the same number of pets as Ben plus Beth."
"Bill buys the same number of pets as Obama."
"Bob buys the same number of pets as Obama."
"Ben buys the same number of pets as Obama."
"Beth buys the same number of pets as Obama."
"If Obama buys one pet, how many pets total does everyone buy?"
The CPAL chain represents the causal structure of the above narrative as a causal graph or DAG, which it can also plot, as shown below.
.
The two major sections below are:
- Technical overview
- Future application
Also see this jupyter notebook doc.
1. Technical overview
CPAL versus PAL
Like PAL, CPAL intends to reduce large language model (LLM) hallucination.
The CPAL chain is different from the PAL chain for a couple of reasons.
- CPAL adds a causal structure (or DAG) to link entity actions (or math expressions).
- The CPAL math expressions are modeling a chain of cause and effect relations, which can be intervened upon, whereas for the PAL chain math expressions are projected math identities.
PAL's generated python code is wrong. It hallucinates when complexity increases.
def solution():
"""Tim buys the same number of pets as Cindy and Boris.Cindy buys the same number of pets as Bill plus Bob.Boris buys the same number of pets as Ben plus Beth.Bill buys the same number of pets as Obama.Bob buys the same number of pets as Obama.Ben buys the same number of pets as Obama.Beth buys the same number of pets as Obama.If Obama buys one pet, how many pets total does everyone buy?"""
obama_pets = 1
tim_pets = obama_pets
cindy_pets = obama_pets + obama_pets
boris_pets = obama_pets + obama_pets
total_pets = tim_pets + cindy_pets + boris_pets
result = total_pets
return result
CPAL's generated python code is correct.
story outcome data
name code value depends_on
0 obama pass 1.0 []
1 bill bill.value = obama.value 1.0 [obama]
2 bob bob.value = obama.value 1.0 [obama]
3 ben ben.value = obama.value 1.0 [obama]
4 beth beth.value = obama.value 1.0 [obama]
5 cindy cindy.value = bill.value + bob.value 2.0 [bill, bob]
6 boris boris.value = ben.value + beth.value 2.0 [ben, beth]
7 tim tim.value = cindy.value + boris.value 4.0 [cindy, boris]
query data
{
"question": "how many pets total does everyone buy?",
"expression": "SELECT SUM(value) FROM df",
"llm_error_msg": ""
}
Based on the comments below, CPAL's intended location in the library is experimental/chains/cpal and PAL's location ischains/pal.
CPAL vs Graph QA
Both the CPAL chain and the Graph QA chain extract entity-action-entity relations into a DAG.
The CPAL chain is different from the Graph QA chain for a few reasons.
- Graph QA does not connect entities to math expressions
- Graph QA does not associate actions in a sequence of dependence.
- Graph QA does not decompose the narrative into these three parts:
- Story plot or causal model
- Hypothetical question
- Hypothetical condition
Evaluation
Preliminary evaluation on simple math word problems shows that this CPAL chain generates less hallucination than the PAL chain on answering questions about a causal narrative. Two examples are in this jupyter notebook doc.
2. Future application
"Plan as Narrative, Test as Code" applications
The thesis here is that the Plan as Narrative, Test as Code approach allows you to represent a project plan both as code and as a narrative, giving you the best of both worlds.
Why Plan as Narrative?
The narrative form is quick. At consensus building meeting, people use narratives to persuade others of their plan. The narrative form is scalable. You can share, version control and index a narrative as opposed to SaaS software work flow diagrams.
Why Test as Code?
Though fast, narratives are problematic as their complexity increases. The problem is LLMs and humans are prone to hallucination when predicting the outcomes of a narrative. The cost of building a consensus around the validity of a narrative outcome grows as its narrative complexity increases. This is a culprit in the “tribal knowledge” problem and "highest-paid person in the room" problem. The Amazon-6-pager narrative meeting form attacks this problem. Likewise, the Plan as Code concept attacks this problem. Code does not require tribal knowledge or social power to validate. As narrative complexity increases, the value of representing a plan as code goes up. Code is testable, complex narratives are not.
Moreover, code is quickly composable, complex narratives are not. Composability means it can be integrated with other project plans and applications. The output of one or more plans can be the input into another team's plan or another computer system. For community voting and funding, composable plans can be integrated with a Dapp. For stochastic simulations, a composable plan can be integrated with the DoWhy library.
In summary, a code representation makes a project plan composable and testable.
An explanation of a future demo app is here.
Twitter handle: @boris_dev
The latest updates on your projects. Learn more about Vercel for Git ↗︎
| Name | Status | Preview | Comments | Updated (UTC) |
|---|---|---|---|---|
| langchain | ❌ Failed (Inspect) | Jul 11, 2023 0:11am |
@hwchase17 @vowelparrot
Question: Is it feasible for this PR of a proposed new CPAL chain to get into the library?
I suspect there is a chance it's not because of the following reasons.
- This PR did not originate from a upstream github issue.
- This PR did not originate from an Arvix academic article.
I am checking with you before I start working on the tests, and Jupyter notebook doc.
This is really cool. I'll take a closer look at the code later tonight, but I absolutely think this is something we'd like to include in the library. If it's too complicated, then we could at least put in the experimental/ directory to start cc @hwchase17 @eyurtsev
ps I'm a big fan of the DoWhy library also btw.
super interesting. yeah lets include in experimental for now
@borisdev is attempting to deploy a commit to the LangChain Team on Vercel.
A member of the Team first needs to authorize it.
lint tests and unit tests pass
@borisdev this is awesome and looks pretty good to me!