langchain
langchain copied to clipboard
add in method for getting confidence
https://arxiv.org/pdf/2207.05221.pdf
should be a separate LLMchain with a prompt that people can add on at the end
Basic prompt/chain:
Question: Who was the first president of the United States?
Answer:
Question: Who was the first president of the United States?
Proposed Answer: George Washington was the first president.
Is the proposed answer:
(A) True
(B) False
The proposed answer is:
"improve performance further by showing the model other T = 1 samples, for comparison"
Question: Who was the third president of the United States?
Here are some brainstormed ideas: James Monroe
Thomas Jefferson
John Adams
Thomas Jefferson
George Washington
Possible Answer: James Monroe
Is the possible answer:
(A) True
(B) False
The possible answer is:
"Overall, if given a few examples from a given distribution, models can generate samples and then self-evaluate them to productively differentiate correct and incorrect samples, with reasonably calibrated confidence."
We should have this play nicely with the self-ask-search agent because that is a key use case here is tying together (1) ability to ask one itself questions and search the web for facts with (2) ability to provide well-calibrated responses.
Probabilistically calibrated confidence levels of the model in answers to questions + drawing information for the answer from trusted sources like key websites/databases, will be a combined major unlock in making Agents more usable for important tasks.
Hi, @hwchase17! I'm helping the LangChain team manage their backlog and I wanted to let you know that we are marking this issue as stale.
Based on my understanding, you requested the addition of a method for obtaining confidence. The suggestion was to create a separate LLMchain with a prompt that users can append to. There was also a suggestion to improve performance by showing the model other T = 1 samples for comparison. It was mentioned that the method should play nicely with the self-ask-search agent to provide well-calibrated responses and draw information from trusted sources.
Before we close this issue, we wanted to check if it is still relevant to the latest version of the LangChain repo. If you respond to this, the LangChain team will be notified to take a look. Otherwise, the issue will be automatically closed in 7 days. Thank you!